Building TranscriptAI: An AI-Powered YouTube Transcript Analysis Platform

By Constantin ET• 10/24/2025• 15 min read

aiwebdevelopmentnextjsgolangpostgrestypescriptproductdevelopment

Building TranscriptAI: An AI-Powered YouTube Transcript Analysis Platform

🎯 The Problem

As a developer, content creator, or researcher, you've probably found yourself in this situation: watching a 2-hour YouTube tutorial or conference talk, desperately trying to find that one code snippet or quote mentioned 45 minutes in. You scrub through the timeline, rewatch sections, and waste valuable time.

YouTube's built-in transcript feature exists, but it's basic. You get raw text with timestamps-nothing more. No AI summaries, no code extraction, no intelligent search. That's where TranscriptAI comes in.

✨ What is TranscriptAI?

TranscriptAI (https://transcriptai.conn.digital) is a free, AI-powered platform that transforms YouTube videos into actionable insights. It's not just a transcript downloader-it's an intelligent analysis tool that helps you:

📝 Download transcripts instantly in multiple formats
🤖 Generate AI summaries (brief, detailed, or key points)
💻 Extract code snippets automatically with syntax highlighting
💬 Pull out important quotes with speaker context
✅ Identify action items from meetings and tutorials
🔍 Ask questions about the video content without watching it

Live Demo: https://transcriptai.conn.digital

🏗️ The Tech Stack

We built TranscriptAI with a modern, production-grade architecture:

Frontend

Next.js 15 (React framework with App Router)
TypeScript (type safety throughout)
Tailwind CSS (utility-first styling)
Shadcn/ui (beautiful, accessible components)

Backend

Go (Golang) (high-performance API server)
PostgreSQL (reliable, scalable database)
Google Gemini AI (cost-effective AI analysis)
Docker (containerized deployment)

Infrastructure

CapRover (self-hosted PaaS for deployment)
Nginx (reverse proxy and load balancing)
Server Plus (VPS hosting)

🎨 Design Philosophy

We built TranscriptAI with three core principles:

1. Privacy First

No sign-up required during beta
No data sold to third parties
Transparent about what we collect (transcripts for caching only)

2. Performance Matters

Average response time: 1-2 seconds for 40-minute videos
Smart caching (process once, retrieve instantly)
Database indexing for lightning-fast queries

3. User Experience is Everything

Clean, modern interface
Smooth animations (60fps, accessibility-aware)
Dark mode support
Mobile-responsive design

🔧 Technical Deep Dive

Architecture: The Proxy Layer Pattern

One of the most interesting technical challenges was hiding our backend infrastructure from clients while maintaining performance. Here's how we solved it:

Problem: Direct client-to-backend communication exposes your infrastructure:

tsx

❌ Browser → https://backend.example.com/api/...
   (Backend URL visible in browser DevTools)

Solution: Server-side proxy layer with Next.js API routes:

tsx

✅ Browser → Next.js Proxy → Go Backend (hidden)
   (Browser only sees /api/transcripts/*)

Implementation:

typescript

// src/app/api/transcripts/fetch/route.ts
export async function POST(request: NextRequest) {
  const body = await request.text();

  // Forward to hidden backend
  const backendResponse = await backendFetch(
    '/api/v1/transcripts/fetch',
    { method: 'POST', body }
  );

  return NextResponse.json(await backendResponse.json());
}

Benefits:

🔒 Backend URL completely hidden from browser
🔒 Professional, clean API surface (
tsx
```
/api/transcripts/*
```
)
🔒 Same-origin requests (better security)
🚀 No performance impact (Next.js handles caching)

Next.js 15 Migration: Async Params

We encountered Next.js 15's breaking change with dynamic route parameters. Previously synchronous params became async Promises:

Before (Next.js 14):

typescript

export async function GET(
  request: NextRequest,
  { params }: { params: { id: string } }
) {
  const id = params.id; // Direct access
}

After (Next.js 15):

typescript

export async function GET(
  request: NextRequest,
  context: { params: Promise<{ id: string }> }
) {
  const params = await context.params; // Await the Promise
  const id = params.id;
}

This required updating 5 API routes, but ensured full Next.js 15 compatibility with zero TypeScript errors.

AI Integration: Multi-Provider Strategy

We designed a flexible AI provider system supporting multiple backends:

// ai_service.go
type AIProvider interface {
    GenerateSummary(transcript string, summaryType string) (*SummaryResult, error)
    ExtractContent(transcript string, extractionType string) (*ExtractionResult, error)
    AnswerQuestion(transcript string, question string) (*QAResult, error)
}

// Implementations: OpenAI, Anthropic Claude, Google Gemini

Why this matters:

🔄 Switch AI providers via environment variable
💰 Compare costs (Gemini is ~90% cheaper than GPT-4)
🚀 Fallback options if one provider is down
🧪 A/B test different models for quality

Current setup: Google Gemini 2.5 Flash

Cost: ~$0.001 per request
Speed: 1-2 seconds average
Quality: Excellent for summaries and extractions

Database Optimization

Smart caching reduces API costs and improves performance:

sql

-- Migration 002: Performance indexes
CREATE INDEX idx_videos_youtube_id ON videos(youtube_id);
CREATE INDEX idx_transcripts_video_id ON transcripts(video_id);
CREATE INDEX idx_transcripts_video_language ON transcripts(video_id, language);

-- Query performance
SELECT * FROM videos WHERE youtube_id = 'dQw4w9WgXcQ';
-- Result: ~0.74ms (with index) vs ~15ms (without)

Caching strategy:

User requests transcript for video
tsx
```
XYZ
```

Check cache:

tsx

SELECT * FROM transcripts WHERE video_id = 'XYZ'

If found: Return instantly (<30ms)
If not found: Fetch from YouTube, cache, then return (~1-2s)
Future requests: Instant retrieval from cache

Result: 95%+ cache hit rate after the first week

🚀 Development Journey

Phase 1: Foundation (Week 1)

Go backend with basic transcript fetching
PostgreSQL database schema
Docker containerization
Initial API endpoints

Phase 2: Frontend (Week 1-2)

Next.js 15 setup with TypeScript
Shadcn/ui component library integration
Responsive design implementation
Dark mode theming

Phase 3: AI Features (Week 2-3)

Multi-provider AI service architecture
Summary generation (brief, detailed, key points)
Content extraction (code, quotes, action items)
Q&A system with citation support

Phase 4: Polish (Week 3-4)

Performance optimization (database indexing)
Error handling and edge cases
UX improvements (animations, loading states)
Accessibility features (WCAG 2.1 AA)

Phase 5: Production (Week 4)

CapRover deployment setup
Nginx configuration
SSL/TLS certificates
Monitoring and logging

Phase 6: Security Hardening (Week 5)

Server-side proxy layer implementation
Environment variable security
Input validation and sanitization
CORS configuration

📊 Performance Metrics

After 4 weeks of development and optimization:

Speed

Transcript fetch: 1-2 seconds (40-min video)
AI summary: 1.6-2.1 seconds (cached: <30ms)
Code extraction: 2-3 seconds
Q&A response: 1.5-2.5 seconds

Reliability

Uptime: 99.9% (CapRover + health checks)
Error rate: <0.1%
Cache hit rate: 95%+

Cost Efficiency

AI cost per request: ~$0.001 (Gemini)
Hosting: $20/month (VPS)
Database: Included in VPS
Total: ~$20/month for unlimited users (during beta)

Code Quality

TypeScript errors: 0
ESLint warnings: 0
Test coverage: >85% (backend)
Lighthouse score: 95+ (performance, accessibility, SEO)

🎯 Key Features Explained

1. Smart Transcript Fetching

Unlike basic YouTube transcript downloaders, TranscriptAI:

Supports multiple languages
Handles auto-generated vs manual transcripts
Preserves accurate timestamps
Caches results for instant retrieval
Handles edge cases (private videos, age-restricted content)

2. AI-Powered Summaries

Generate three types of summaries:

Brief (2-3 sentences):

"This tutorial covers React hooks including useState and useEffect. The instructor demonstrates building a todo app with real-time updates. Key concepts include component lifecycle and state management patterns."

Detailed (multi-paragraph):

Comprehensive analysis with sections, context, and detailed explanations.

Key Points (bulleted list):

useState manages local component state

useEffect handles side effects and cleanup

Custom hooks enable code reuse

Context API prevents prop drilling

3. Content Extraction

Code Snippets:

typescript

// Automatically extracted with:
// - Language detection (TypeScript)
// - Syntax highlighting
// - Context explanation
// - Line number references

function useFetch<T>(url: string): { data: T | null, loading: boolean } {
  const [data, setData] = useState<T | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(url).then(res => res.json()).then(setData).finally(() => setLoading(false));
  }, [url]);

  return { data, loading };
}

Quotes:

"The key to understanding React is realizing that components are just functions that return UI."

Speaker: Tutorial Instructor | Importance: High

Action Items:

HIGH: Refactor authentication logic to use JWT tokens
MEDIUM: Add error boundary to prevent app crashes
LOW: Update documentation with new API endpoints

4. Intelligent Q&A

Ask questions about the video without watching it:

Question: "What is the main promise made in this song?"

Answer: The central promise is one of unwavering loyalty and commitment. The singer pledges to never give up on the relationship, never let the other person down, never run around and desert them, never make them cry, never say goodbye, and never tell a lie to hurt them.

Confidence: High

Sources:

"Never going to give you up. Never going to let you down..." "Never going to run around and desert you..."

🔒 Security Considerations

Security was a top priority throughout development:

1. Backend URL Protection

✅ Implemented: Server-side proxy layer hides backend infrastructure ✅ Verified: Browser DevTools shows zero backend requests ✅ Benefit: Professional API surface, enhanced security

2. Input Validation

✅ URL validation: Ensures only valid YouTube URLs accepted ✅ Language validation: ISO 639-1 code enforcement ✅ Request timeouts: 30-second limit prevents hanging ✅ SQL injection prevention: Parameterized queries only

3. Environment Variables

✅ Secrets management: API keys in CapRover env vars (not .env files) ✅ No hardcoded credentials: All sensitive data externalized ✅ Production validation: Strong password enforcement

4. Error Handling

✅ No stack traces exposed: User-friendly error messages only ✅ Detailed server logging: Errors logged for debugging ✅ Graceful degradation: App remains functional during API failures

5. CORS Configuration

✅ Restricted origins: Only allowed domains accepted ✅ Same-origin proxy requests: No CORS issues in production ✅ Secure headers: CSP, X-Frame-Options, etc.

🎨 UX/UI Highlights

Smooth Animations (60fps)

All animations respect

tsx

prefers-reduced-motion

for accessibility:

css

/* Custom Tailwind animations */
@keyframes fade-in {
  from { opacity: 0; }
  to { opacity: 1; }
}

@keyframes slide-in-right {
  from { transform: translateX(100%); opacity: 0; }
  to { transform: translateX(0); opacity: 1; }
}

@keyframes shimmer {
  0% { background-position: -1000px 0; }
  100% { background-position: 1000px 0; }
}

Animated components:

Loading spinners with smooth rotation
Skeleton loaders with shimmer effect
Toast notifications with slide-in
Button ripple effects
Card hover lift effects
Modal scale-in entrance

Responsive Design

Breakpoints optimized for all devices:

Mobile: 375px - 640px (optimized for iPhone, Android)
Tablet: 640px - 1024px (iPad, Android tablets)
Desktop: 1024px - 1920px (laptops, monitors)
Large: 1920px+ (4K displays)

Layout scaling: 80% base font-size ensures consistent appearance across zoom levels (development at 80% zoom, production at 100% zoom).

Dark Mode

Full dark mode support with smooth transitions:

typescript

// Theme provider with system preference detection
<ThemeProvider
  attribute="class"
  defaultTheme="system"
  enableSystem
  disableTransitionOnChange={false}
>
  {children}
</ThemeProvider>

Colors optimized for:

✅ WCAG 2.1 AA contrast ratios
✅ Reduced eye strain in dark environments
✅ Consistent brand identity across themes

📈 Lessons Learned

1. Next.js 15 Breaking Changes

Challenge: Async params broke all dynamic routes Solution: Systematic migration to Promise-based params Takeaway: Always test major framework upgrades in staging first

2. AI Provider Selection

Challenge: GPT-4 costs were prohibitive at scale Solution: Multi-provider architecture, switched to Gemini Takeaway: Design for flexibility from day one (cost optimization later)

3. Database Indexing

Challenge: Queries became slow with 1000+ cached transcripts Solution: Strategic composite indexes on frequently queried columns Takeaway: Profile early, optimize often (10x speedup with proper indexes)

4. Browser Zoom Consistency

Challenge: Layout developed at 80% zoom looked different at 100% Solution: Set

tsx

html { font-size: 80%; }

for proportional scaling Takeaway: Define design system with rem units for scalability

5. Error Message Security

Challenge: Stack traces leaking backend details Solution: Sanitize errors client-side, log details server-side Takeaway: Never trust error messages to be safe for users

6. TypeScript Configuration

Challenge: Next.js validator errors in .next/types Solution: Proper async/await patterns for Next.js 15 API routes Takeaway: Framework-specific patterns matter for type safety

🚀 Deployment Strategy

Infrastructure

Hosting: CapRover on Server Plus VPS

Why CapRover: Self-hosted Heroku alternative, free (just VPS cost)
Why Server Plus: Reliable, affordable, EU-based

Deployment Flow:

bash

# 1. Build Docker images
docker build -t transcriptai-frontend:latest ./frontend
docker build -t transcriptai-backend:latest ./backend

# 2. Push to CapRover
./deploy-frontend.sh
./deploy-backend.sh

# 3. CapRover handles:
#    - Container orchestration
#    - SSL certificate renewal
#    - Zero-downtime deployments
#    - Health checks and auto-restart

Database: PostgreSQL on same VPS

Automated daily backups
Replication for disaster recovery
Connection pooling (25 max connections)

Reverse Proxy: Nginx

SSL/TLS termination (Let's Encrypt)
Gzip compression
Static asset caching
Rate limiting (1000 req/min)

CI/CD (Future Enhancement)

Planned GitHub Actions workflow:

yaml

name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - Lint (ESLint, golangci-lint)
      - Type check (TypeScript, Go)
      - Unit tests (backend 85%+ coverage)
      - Integration tests (API endpoints)

  deploy:
    needs: test
    steps:
      - Build Docker images
      - Push to CapRover
      - Run smoke tests
      - Notify team (Slack/Discord)

🌍 Production URLs

Primary: https://transcriptai.conn.digital Secondary: https://transcriptai.serverplus.org

Both URLs are live and fully functional, with the same features and performance.

📊 What's Next?

We're continuously improving TranscriptAI. Here's the roadmap:

Short-Term (Q4 2025)

Multi-language transcript support (beyond English)
Video chapter detection and navigation
Browser extension (Chrome, Firefox)
Batch processing (analyze multiple videos at once)

Mid-Term (Q1 2026)

User accounts and history
Custom AI prompt templates
Transcript search across all saved videos
API access for developers
Webhook integrations (Zapier, Make)

Long-Term (Q2 2026+)

Mobile apps (iOS, Android)
Real-time transcript generation during live streams
Collaboration features (teams, shared workspaces)
Advanced analytics (topic clustering, trend analysis)
Integration with note-taking apps (Notion, Obsidian)

💡 Why We Built This

As developers and content creators at Conn.Digital, we found ourselves constantly:

Rewatching conference talks to find that one code snippet
Copying quotes from podcasts for blog posts
Trying to remember action items from client calls
Wishing we could search inside video content like we search documents

We built TranscriptAI to solve our own problems. If it helps you too, that's the best outcome we could ask for.

🤝 Open Source Philosophy

While TranscriptAI is currently a proprietary project, I'm committed to:

Sharing knowledge: Technical blog posts like this one
Open discussions: Join our Discord for architecture talks
Free beta access: No paywalls during development
Future considerations: May open-source core components if there's community interest

Interested in the technical details? Have questions? Want to collaborate?

Reach out:

Website: https://conn.digital
Email: constantin@conn.digital
LinkedIn: Connect with us and share your thoughts
GitHub: Stay tuned for potential open-source releases

🎓 Technical Insights for Developers

Key Takeaways

Proxy layers protect infrastructure: Never expose backend URLs to clients
Multi-provider AI is smart: Avoid vendor lock-in, optimize costs
Database indexing is critical: 10x performance gains with proper indexes
TypeScript + Go = great combo: Type safety on both ends
Accessibility matters:
tsx
```
prefers-reduced-motion
```
, WCAG compliance, etc.
Caching saves money: 95% cache hit rate = 95% fewer API calls

Code Snippets Worth Stealing

1. Next.js Server-Side Proxy Pattern:

typescript

// src/lib/server/backend-client.ts
const backendBaseUrl =
  process.env.BACKEND_API_URL ??
  process.env.NEXT_PUBLIC_BACKEND_URL ??
  "http://localhost:8080";

export async function backendFetch(path: string, init?: RequestInit) {
  const url = backendBaseUrl + path;
  return fetch(url, { ...init, cache: "no-store" });
}

2. Go AI Provider Interface:

type AIProvider interface {
    GenerateSummary(ctx context.Context, transcript string, summaryType string) (*SummaryResult, error)
}

type OpenAIProvider struct { /* ... */ }
type GeminiProvider struct { /* ... */ }

func NewAIProvider() AIProvider {
    provider := os.Getenv("AI_PROVIDER")
    switch provider {
    case "openai":
        return &OpenAIProvider{}
    case "google":
        return &GeminiProvider{}
    default:
        return &GeminiProvider{}
    }
}

3. Database Query Optimization:

sql

-- BEFORE (slow: ~15ms)
SELECT * FROM transcripts WHERE video_id = $1;

-- AFTER (fast: ~0.74ms)
CREATE INDEX idx_transcripts_video_id ON transcripts(video_id);
SELECT * FROM transcripts WHERE video_id = $1;
-- Now uses index scan instead of sequential scan

🏆 Results

After 5 weeks of intensive development:

✅ Fully functional platform with 6 core features
✅ Production-ready infrastructure on CapRover
✅ Zero TypeScript/ESLint errors (clean codebase)
✅ High performance (1-2s average response time)
✅ Secure architecture (backend hidden, input validated)
✅ Accessible design (WCAG 2.1 AA compliant)
✅ Cost-effective (~$20/month operating cost)
✅ Scalable (ready for 10,000+ users)

🙏 Acknowledgments

Built with love by the Conn.Digital team.

Special thanks to:

Next.js Team for an incredible framework
Vercel for excellent documentation
Shadcn for beautiful UI components
Google for affordable AI (Gemini)
CapRover for simplified deployment
Open source community for inspiration and tools

📞 Get in Touch

Try TranscriptAI: https://transcriptai.conn.digital

Questions? Feedback? Collaboration opportunities?

Website: https://conn.digital
Email: constantin@conn.digital
LinkedIn: Share your thoughts and connect

I'd love to hear from you!