Back to Blog

Building TranscriptAI: An AI-Powered YouTube Transcript Analysis Platform

By Constantin ET10/24/202515 min read
aiwebdevelopmentnextjsgolangpostgrestypescriptproductdevelopment
Building TranscriptAI: An AI-Powered YouTube Transcript Analysis Platform

🎯 The Problem

As a developer, content creator, or researcher, you've probably found yourself in this situation: watching a 2-hour YouTube tutorial or conference talk, desperately trying to find that one code snippet or quote mentioned 45 minutes in. You scrub through the timeline, rewatch sections, and waste valuable time.

YouTube's built-in transcript feature exists, but it's basic. You get raw text with timestamps-nothing more. No AI summaries, no code extraction, no intelligent search. That's where TranscriptAI comes in.


✨ What is TranscriptAI?

TranscriptAI (https://transcriptai.conn.digital) is a free, AI-powered platform that transforms YouTube videos into actionable insights. It's not just a transcript downloader-it's an intelligent analysis tool that helps you:

  • 📝 Download transcripts instantly in multiple formats
  • 🤖 Generate AI summaries (brief, detailed, or key points)
  • 💻 Extract code snippets automatically with syntax highlighting
  • 💬 Pull out important quotes with speaker context
  • Identify action items from meetings and tutorials
  • 🔍 Ask questions about the video content without watching it

Live Demo: https://transcriptai.conn.digital


🏗️ The Tech Stack

We built TranscriptAI with a modern, production-grade architecture:

Frontend

  • Next.js 15 (React framework with App Router)
  • TypeScript (type safety throughout)
  • Tailwind CSS (utility-first styling)
  • Shadcn/ui (beautiful, accessible components)

Backend

  • Go (Golang) (high-performance API server)
  • PostgreSQL (reliable, scalable database)
  • Google Gemini AI (cost-effective AI analysis)
  • Docker (containerized deployment)

Infrastructure

  • CapRover (self-hosted PaaS for deployment)
  • Nginx (reverse proxy and load balancing)
  • Server Plus (VPS hosting)

🎨 Design Philosophy

We built TranscriptAI with three core principles:

1. Privacy First

  • No sign-up required during beta
  • No data sold to third parties
  • Transparent about what we collect (transcripts for caching only)

2. Performance Matters

  • Average response time: 1-2 seconds for 40-minute videos
  • Smart caching (process once, retrieve instantly)
  • Database indexing for lightning-fast queries

3. User Experience is Everything

  • Clean, modern interface
  • Smooth animations (60fps, accessibility-aware)
  • Dark mode support
  • Mobile-responsive design

🔧 Technical Deep Dive

Architecture: The Proxy Layer Pattern

One of the most interesting technical challenges was hiding our backend infrastructure from clients while maintaining performance. Here's how we solved it:

Problem: Direct client-to-backend communication exposes your infrastructure:

tsx
❌ Browser → https://backend.example.com/api/...
   (Backend URL visible in browser DevTools)

Solution: Server-side proxy layer with Next.js API routes:

tsx
✅ Browser → Next.js Proxy → Go Backend (hidden)
   (Browser only sees /api/transcripts/*)

Implementation:

typescript
// src/app/api/transcripts/fetch/route.ts
export async function POST(request: NextRequest) {
  const body = await request.text();

  // Forward to hidden backend
  const backendResponse = await backendFetch(
    '/api/v1/transcripts/fetch',
    { method: 'POST', body }
  );

  return NextResponse.json(await backendResponse.json());
}

Benefits:

  • 🔒 Backend URL completely hidden from browser
  • 🔒 Professional, clean API surface (
    tsx
    /api/transcripts/*
    )
  • 🔒 Same-origin requests (better security)
  • 🚀 No performance impact (Next.js handles caching)

Next.js 15 Migration: Async Params

We encountered Next.js 15's breaking change with dynamic route parameters. Previously synchronous params became async Promises:

Before (Next.js 14):

typescript
export async function GET(
  request: NextRequest,
  { params }: { params: { id: string } }
) {
  const id = params.id; // Direct access
}

After (Next.js 15):

typescript
export async function GET(
  request: NextRequest,
  context: { params: Promise<{ id: string }> }
) {
  const params = await context.params; // Await the Promise
  const id = params.id;
}

This required updating 5 API routes, but ensured full Next.js 15 compatibility with zero TypeScript errors.

AI Integration: Multi-Provider Strategy

We designed a flexible AI provider system supporting multiple backends:

go
// ai_service.go
type AIProvider interface {
    GenerateSummary(transcript string, summaryType string) (*SummaryResult, error)
    ExtractContent(transcript string, extractionType string) (*ExtractionResult, error)
    AnswerQuestion(transcript string, question string) (*QAResult, error)
}

// Implementations: OpenAI, Anthropic Claude, Google Gemini

Why this matters:

  • 🔄 Switch AI providers via environment variable
  • 💰 Compare costs (Gemini is ~90% cheaper than GPT-4)
  • 🚀 Fallback options if one provider is down
  • 🧪 A/B test different models for quality

Current setup: Google Gemini 2.5 Flash

  • Cost: ~$0.001 per request
  • Speed: 1-2 seconds average
  • Quality: Excellent for summaries and extractions

Database Optimization

Smart caching reduces API costs and improves performance:

sql
-- Migration 002: Performance indexes
CREATE INDEX idx_videos_youtube_id ON videos(youtube_id);
CREATE INDEX idx_transcripts_video_id ON transcripts(video_id);
CREATE INDEX idx_transcripts_video_language ON transcripts(video_id, language);

-- Query performance
SELECT * FROM videos WHERE youtube_id = 'dQw4w9WgXcQ';
-- Result: ~0.74ms (with index) vs ~15ms (without)

Caching strategy:

  1. User requests transcript for video
    tsx
    XYZ
  2. Check cache:
    tsx
    SELECT * FROM transcripts WHERE video_id = 'XYZ'
  3. If found: Return instantly (<30ms)
  4. If not found: Fetch from YouTube, cache, then return (~1-2s)
  5. Future requests: Instant retrieval from cache

Result: 95%+ cache hit rate after the first week


🚀 Development Journey

Phase 1: Foundation (Week 1)

  • Go backend with basic transcript fetching
  • PostgreSQL database schema
  • Docker containerization
  • Initial API endpoints

Phase 2: Frontend (Week 1-2)

  • Next.js 15 setup with TypeScript
  • Shadcn/ui component library integration
  • Responsive design implementation
  • Dark mode theming

Phase 3: AI Features (Week 2-3)

  • Multi-provider AI service architecture
  • Summary generation (brief, detailed, key points)
  • Content extraction (code, quotes, action items)
  • Q&A system with citation support

Phase 4: Polish (Week 3-4)

  • Performance optimization (database indexing)
  • Error handling and edge cases
  • UX improvements (animations, loading states)
  • Accessibility features (WCAG 2.1 AA)

Phase 5: Production (Week 4)

  • CapRover deployment setup
  • Nginx configuration
  • SSL/TLS certificates
  • Monitoring and logging

Phase 6: Security Hardening (Week 5)

  • Server-side proxy layer implementation
  • Environment variable security
  • Input validation and sanitization
  • CORS configuration

📊 Performance Metrics

After 4 weeks of development and optimization:

Speed

  • Transcript fetch: 1-2 seconds (40-min video)
  • AI summary: 1.6-2.1 seconds (cached: <30ms)
  • Code extraction: 2-3 seconds
  • Q&A response: 1.5-2.5 seconds

Reliability

  • Uptime: 99.9% (CapRover + health checks)
  • Error rate: <0.1%
  • Cache hit rate: 95%+

Cost Efficiency

  • AI cost per request: ~$0.001 (Gemini)
  • Hosting: $20/month (VPS)
  • Database: Included in VPS
  • Total: ~$20/month for unlimited users (during beta)

Code Quality

  • TypeScript errors: 0
  • ESLint warnings: 0
  • Test coverage: >85% (backend)
  • Lighthouse score: 95+ (performance, accessibility, SEO)

🎯 Key Features Explained

1. Smart Transcript Fetching

Unlike basic YouTube transcript downloaders, TranscriptAI:

  • Supports multiple languages
  • Handles auto-generated vs manual transcripts
  • Preserves accurate timestamps
  • Caches results for instant retrieval
  • Handles edge cases (private videos, age-restricted content)

2. AI-Powered Summaries

Generate three types of summaries:

Brief (2-3 sentences):

"This tutorial covers React hooks including useState and useEffect. The instructor demonstrates building a todo app with real-time updates. Key concepts include component lifecycle and state management patterns."

Detailed (multi-paragraph):

Comprehensive analysis with sections, context, and detailed explanations.

Key Points (bulleted list):

  • useState manages local component state
  • useEffect handles side effects and cleanup
  • Custom hooks enable code reuse
  • Context API prevents prop drilling

3. Content Extraction

Code Snippets:

typescript
// Automatically extracted with:
// - Language detection (TypeScript)
// - Syntax highlighting
// - Context explanation
// - Line number references

function useFetch<T>(url: string): { data: T | null, loading: boolean } {
  const [data, setData] = useState<T | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(url).then(res => res.json()).then(setData).finally(() => setLoading(false));
  }, [url]);

  return { data, loading };
}

Quotes:

"The key to understanding React is realizing that components are just functions that return UI."

  • Speaker: Tutorial Instructor | Importance: High

Action Items:

  • HIGH: Refactor authentication logic to use JWT tokens
  • MEDIUM: Add error boundary to prevent app crashes
  • LOW: Update documentation with new API endpoints

4. Intelligent Q&A

Ask questions about the video without watching it:

Question: "What is the main promise made in this song?"

Answer: The central promise is one of unwavering loyalty and commitment. The singer pledges to never give up on the relationship, never let the other person down, never run around and desert them, never make them cry, never say goodbye, and never tell a lie to hurt them.

Confidence: High

Sources:

"Never going to give you up. Never going to let you down..." "Never going to run around and desert you..."


🔒 Security Considerations

Security was a top priority throughout development:

1. Backend URL Protection

Implemented: Server-side proxy layer hides backend infrastructure ✅ Verified: Browser DevTools shows zero backend requests ✅ Benefit: Professional API surface, enhanced security

2. Input Validation

URL validation: Ensures only valid YouTube URLs accepted ✅ Language validation: ISO 639-1 code enforcement ✅ Request timeouts: 30-second limit prevents hanging ✅ SQL injection prevention: Parameterized queries only

3. Environment Variables

Secrets management: API keys in CapRover env vars (not .env files) ✅ No hardcoded credentials: All sensitive data externalized ✅ Production validation: Strong password enforcement

4. Error Handling

No stack traces exposed: User-friendly error messages only ✅ Detailed server logging: Errors logged for debugging ✅ Graceful degradation: App remains functional during API failures

5. CORS Configuration

Restricted origins: Only allowed domains accepted ✅ Same-origin proxy requests: No CORS issues in production ✅ Secure headers: CSP, X-Frame-Options, etc.


🎨 UX/UI Highlights

Smooth Animations (60fps)

All animations respect

tsx
prefers-reduced-motion
for accessibility:

css
/* Custom Tailwind animations */
@keyframes fade-in {
  from { opacity: 0; }
  to { opacity: 1; }
}

@keyframes slide-in-right {
  from { transform: translateX(100%); opacity: 0; }
  to { transform: translateX(0); opacity: 1; }
}

@keyframes shimmer {
  0% { background-position: -1000px 0; }
  100% { background-position: 1000px 0; }
}

Animated components:

  • Loading spinners with smooth rotation
  • Skeleton loaders with shimmer effect
  • Toast notifications with slide-in
  • Button ripple effects
  • Card hover lift effects
  • Modal scale-in entrance

Responsive Design

Breakpoints optimized for all devices:

  • Mobile: 375px - 640px (optimized for iPhone, Android)
  • Tablet: 640px - 1024px (iPad, Android tablets)
  • Desktop: 1024px - 1920px (laptops, monitors)
  • Large: 1920px+ (4K displays)

Layout scaling: 80% base font-size ensures consistent appearance across zoom levels (development at 80% zoom, production at 100% zoom).

Dark Mode

Full dark mode support with smooth transitions:

typescript
// Theme provider with system preference detection
<ThemeProvider
  attribute="class"
  defaultTheme="system"
  enableSystem
  disableTransitionOnChange={false}
>
  {children}
</ThemeProvider>

Colors optimized for:

  • ✅ WCAG 2.1 AA contrast ratios
  • ✅ Reduced eye strain in dark environments
  • ✅ Consistent brand identity across themes

📈 Lessons Learned

1. Next.js 15 Breaking Changes

Challenge: Async params broke all dynamic routes Solution: Systematic migration to Promise-based params Takeaway: Always test major framework upgrades in staging first

2. AI Provider Selection

Challenge: GPT-4 costs were prohibitive at scale Solution: Multi-provider architecture, switched to Gemini Takeaway: Design for flexibility from day one (cost optimization later)

3. Database Indexing

Challenge: Queries became slow with 1000+ cached transcripts Solution: Strategic composite indexes on frequently queried columns Takeaway: Profile early, optimize often (10x speedup with proper indexes)

4. Browser Zoom Consistency

Challenge: Layout developed at 80% zoom looked different at 100% Solution: Set

tsx
html { font-size: 80%; }
for proportional scaling Takeaway: Define design system with rem units for scalability

5. Error Message Security

Challenge: Stack traces leaking backend details Solution: Sanitize errors client-side, log details server-side Takeaway: Never trust error messages to be safe for users

6. TypeScript Configuration

Challenge: Next.js validator errors in .next/types Solution: Proper async/await patterns for Next.js 15 API routes Takeaway: Framework-specific patterns matter for type safety


🚀 Deployment Strategy

Infrastructure

Hosting: CapRover on Server Plus VPS

  • Why CapRover: Self-hosted Heroku alternative, free (just VPS cost)
  • Why Server Plus: Reliable, affordable, EU-based

Deployment Flow:

bash
# 1. Build Docker images
docker build -t transcriptai-frontend:latest ./frontend
docker build -t transcriptai-backend:latest ./backend

# 2. Push to CapRover
./deploy-frontend.sh
./deploy-backend.sh

# 3. CapRover handles:
#    - Container orchestration
#    - SSL certificate renewal
#    - Zero-downtime deployments
#    - Health checks and auto-restart

Database: PostgreSQL on same VPS

  • Automated daily backups
  • Replication for disaster recovery
  • Connection pooling (25 max connections)

Reverse Proxy: Nginx

  • SSL/TLS termination (Let's Encrypt)
  • Gzip compression
  • Static asset caching
  • Rate limiting (1000 req/min)

CI/CD (Future Enhancement)

Planned GitHub Actions workflow:

yaml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - Lint (ESLint, golangci-lint)
      - Type check (TypeScript, Go)
      - Unit tests (backend 85%+ coverage)
      - Integration tests (API endpoints)

  deploy:
    needs: test
    steps:
      - Build Docker images
      - Push to CapRover
      - Run smoke tests
      - Notify team (Slack/Discord)

🌍 Production URLs

Primary: https://transcriptai.conn.digital Secondary: https://transcriptai.serverplus.org

Both URLs are live and fully functional, with the same features and performance.


📊 What's Next?

We're continuously improving TranscriptAI. Here's the roadmap:

Short-Term (Q4 2025)

  • Multi-language transcript support (beyond English)
  • Video chapter detection and navigation
  • Browser extension (Chrome, Firefox)
  • Batch processing (analyze multiple videos at once)

Mid-Term (Q1 2026)

  • User accounts and history
  • Custom AI prompt templates
  • Transcript search across all saved videos
  • API access for developers
  • Webhook integrations (Zapier, Make)

Long-Term (Q2 2026+)

  • Mobile apps (iOS, Android)
  • Real-time transcript generation during live streams
  • Collaboration features (teams, shared workspaces)
  • Advanced analytics (topic clustering, trend analysis)
  • Integration with note-taking apps (Notion, Obsidian)

💡 Why We Built This

As developers and content creators at Conn.Digital, we found ourselves constantly:

  • Rewatching conference talks to find that one code snippet
  • Copying quotes from podcasts for blog posts
  • Trying to remember action items from client calls
  • Wishing we could search inside video content like we search documents

We built TranscriptAI to solve our own problems. If it helps you too, that's the best outcome we could ask for.


🤝 Open Source Philosophy

While TranscriptAI is currently a proprietary project, I'm committed to:

  • Sharing knowledge: Technical blog posts like this one
  • Open discussions: Join our Discord for architecture talks
  • Free beta access: No paywalls during development
  • Future considerations: May open-source core components if there's community interest

Interested in the technical details? Have questions? Want to collaborate?

Reach out:


🎓 Technical Insights for Developers

Key Takeaways

  1. Proxy layers protect infrastructure: Never expose backend URLs to clients
  2. Multi-provider AI is smart: Avoid vendor lock-in, optimize costs
  3. Database indexing is critical: 10x performance gains with proper indexes
  4. TypeScript + Go = great combo: Type safety on both ends
  5. Accessibility matters:
    tsx
    prefers-reduced-motion
    , WCAG compliance, etc.
  6. Caching saves money: 95% cache hit rate = 95% fewer API calls

Code Snippets Worth Stealing

1. Next.js Server-Side Proxy Pattern:

typescript
// src/lib/server/backend-client.ts
const backendBaseUrl =
  process.env.BACKEND_API_URL ??
  process.env.NEXT_PUBLIC_BACKEND_URL ??
  "http://localhost:8080";

export async function backendFetch(path: string, init?: RequestInit) {
  const url = backendBaseUrl + path;
  return fetch(url, { ...init, cache: "no-store" });
}

2. Go AI Provider Interface:

go
type AIProvider interface {
    GenerateSummary(ctx context.Context, transcript string, summaryType string) (*SummaryResult, error)
}

type OpenAIProvider struct { /* ... */ }
type GeminiProvider struct { /* ... */ }

func NewAIProvider() AIProvider {
    provider := os.Getenv("AI_PROVIDER")
    switch provider {
    case "openai":
        return &OpenAIProvider{}
    case "google":
        return &GeminiProvider{}
    default:
        return &GeminiProvider{}
    }
}

3. Database Query Optimization:

sql
-- BEFORE (slow: ~15ms)
SELECT * FROM transcripts WHERE video_id = $1;

-- AFTER (fast: ~0.74ms)
CREATE INDEX idx_transcripts_video_id ON transcripts(video_id);
SELECT * FROM transcripts WHERE video_id = $1;
-- Now uses index scan instead of sequential scan

🏆 Results

After 5 weeks of intensive development:

  • Fully functional platform with 6 core features
  • Production-ready infrastructure on CapRover
  • Zero TypeScript/ESLint errors (clean codebase)
  • High performance (1-2s average response time)
  • Secure architecture (backend hidden, input validated)
  • Accessible design (WCAG 2.1 AA compliant)
  • Cost-effective (~$20/month operating cost)
  • Scalable (ready for 10,000+ users)

🙏 Acknowledgments

Built with love by the Conn.Digital team.

Special thanks to:

  • Next.js Team for an incredible framework
  • Vercel for excellent documentation
  • Shadcn for beautiful UI components
  • Google for affordable AI (Gemini)
  • CapRover for simplified deployment
  • Open source community for inspiration and tools

📞 Get in Touch

Try TranscriptAI: https://transcriptai.conn.digital

Questions? Feedback? Collaboration opportunities?

I'd love to hear from you!