🎯 The Problem
As a developer, content creator, or researcher, you've probably found yourself in this situation: watching a 2-hour YouTube tutorial or conference talk, desperately trying to find that one code snippet or quote mentioned 45 minutes in. You scrub through the timeline, rewatch sections, and waste valuable time.
YouTube's built-in transcript feature exists, but it's basic. You get raw text with timestamps-nothing more. No AI summaries, no code extraction, no intelligent search. That's where TranscriptAI comes in.
✨ What is TranscriptAI?
TranscriptAI (https://transcriptai.conn.digital) is a free, AI-powered platform that transforms YouTube videos into actionable insights. It's not just a transcript downloader-it's an intelligent analysis tool that helps you:
- 📝 Download transcripts instantly in multiple formats
- 🤖 Generate AI summaries (brief, detailed, or key points)
- 💻 Extract code snippets automatically with syntax highlighting
- 💬 Pull out important quotes with speaker context
- ✅ Identify action items from meetings and tutorials
- 🔍 Ask questions about the video content without watching it
Live Demo: https://transcriptai.conn.digital
🏗️ The Tech Stack
We built TranscriptAI with a modern, production-grade architecture:
Frontend
- Next.js 15 (React framework with App Router)
- TypeScript (type safety throughout)
- Tailwind CSS (utility-first styling)
- Shadcn/ui (beautiful, accessible components)
Backend
- Go (Golang) (high-performance API server)
- PostgreSQL (reliable, scalable database)
- Google Gemini AI (cost-effective AI analysis)
- Docker (containerized deployment)
Infrastructure
- CapRover (self-hosted PaaS for deployment)
- Nginx (reverse proxy and load balancing)
- Server Plus (VPS hosting)
🎨 Design Philosophy
We built TranscriptAI with three core principles:
1. Privacy First
- No sign-up required during beta
- No data sold to third parties
- Transparent about what we collect (transcripts for caching only)
2. Performance Matters
- Average response time: 1-2 seconds for 40-minute videos
- Smart caching (process once, retrieve instantly)
- Database indexing for lightning-fast queries
3. User Experience is Everything
- Clean, modern interface
- Smooth animations (60fps, accessibility-aware)
- Dark mode support
- Mobile-responsive design
🔧 Technical Deep Dive
Architecture: The Proxy Layer Pattern
One of the most interesting technical challenges was hiding our backend infrastructure from clients while maintaining performance. Here's how we solved it:
Problem: Direct client-to-backend communication exposes your infrastructure:
❌ Browser → https://backend.example.com/api/...
(Backend URL visible in browser DevTools)Solution: Server-side proxy layer with Next.js API routes:
✅ Browser → Next.js Proxy → Go Backend (hidden)
(Browser only sees /api/transcripts/*)Implementation:
// src/app/api/transcripts/fetch/route.ts
export async function POST(request: NextRequest) {
const body = await request.text();
// Forward to hidden backend
const backendResponse = await backendFetch(
'/api/v1/transcripts/fetch',
{ method: 'POST', body }
);
return NextResponse.json(await backendResponse.json());
}Benefits:
- 🔒 Backend URL completely hidden from browser
- 🔒 Professional, clean API surface ()tsx
/api/transcripts/* - 🔒 Same-origin requests (better security)
- 🚀 No performance impact (Next.js handles caching)
Next.js 15 Migration: Async Params
We encountered Next.js 15's breaking change with dynamic route parameters. Previously synchronous params became async Promises:
Before (Next.js 14):
export async function GET(
request: NextRequest,
{ params }: { params: { id: string } }
) {
const id = params.id; // Direct access
}After (Next.js 15):
export async function GET(
request: NextRequest,
context: { params: Promise<{ id: string }> }
) {
const params = await context.params; // Await the Promise
const id = params.id;
}This required updating 5 API routes, but ensured full Next.js 15 compatibility with zero TypeScript errors.
AI Integration: Multi-Provider Strategy
We designed a flexible AI provider system supporting multiple backends:
// ai_service.go
type AIProvider interface {
GenerateSummary(transcript string, summaryType string) (*SummaryResult, error)
ExtractContent(transcript string, extractionType string) (*ExtractionResult, error)
AnswerQuestion(transcript string, question string) (*QAResult, error)
}
// Implementations: OpenAI, Anthropic Claude, Google GeminiWhy this matters:
- 🔄 Switch AI providers via environment variable
- 💰 Compare costs (Gemini is ~90% cheaper than GPT-4)
- 🚀 Fallback options if one provider is down
- 🧪 A/B test different models for quality
Current setup: Google Gemini 2.5 Flash
- Cost: ~$0.001 per request
- Speed: 1-2 seconds average
- Quality: Excellent for summaries and extractions
Database Optimization
Smart caching reduces API costs and improves performance:
-- Migration 002: Performance indexes
CREATE INDEX idx_videos_youtube_id ON videos(youtube_id);
CREATE INDEX idx_transcripts_video_id ON transcripts(video_id);
CREATE INDEX idx_transcripts_video_language ON transcripts(video_id, language);
-- Query performance
SELECT * FROM videos WHERE youtube_id = 'dQw4w9WgXcQ';
-- Result: ~0.74ms (with index) vs ~15ms (without)Caching strategy:
- User requests transcript for video tsx
XYZ - Check cache: tsx
SELECT * FROM transcripts WHERE video_id = 'XYZ' - If found: Return instantly (<30ms)
- If not found: Fetch from YouTube, cache, then return (~1-2s)
- Future requests: Instant retrieval from cache
Result: 95%+ cache hit rate after the first week
🚀 Development Journey
Phase 1: Foundation (Week 1)
- Go backend with basic transcript fetching
- PostgreSQL database schema
- Docker containerization
- Initial API endpoints
Phase 2: Frontend (Week 1-2)
- Next.js 15 setup with TypeScript
- Shadcn/ui component library integration
- Responsive design implementation
- Dark mode theming
Phase 3: AI Features (Week 2-3)
- Multi-provider AI service architecture
- Summary generation (brief, detailed, key points)
- Content extraction (code, quotes, action items)
- Q&A system with citation support
Phase 4: Polish (Week 3-4)
- Performance optimization (database indexing)
- Error handling and edge cases
- UX improvements (animations, loading states)
- Accessibility features (WCAG 2.1 AA)
Phase 5: Production (Week 4)
- CapRover deployment setup
- Nginx configuration
- SSL/TLS certificates
- Monitoring and logging
Phase 6: Security Hardening (Week 5)
- Server-side proxy layer implementation
- Environment variable security
- Input validation and sanitization
- CORS configuration
📊 Performance Metrics
After 4 weeks of development and optimization:
Speed
- Transcript fetch: 1-2 seconds (40-min video)
- AI summary: 1.6-2.1 seconds (cached: <30ms)
- Code extraction: 2-3 seconds
- Q&A response: 1.5-2.5 seconds
Reliability
- Uptime: 99.9% (CapRover + health checks)
- Error rate: <0.1%
- Cache hit rate: 95%+
Cost Efficiency
- AI cost per request: ~$0.001 (Gemini)
- Hosting: $20/month (VPS)
- Database: Included in VPS
- Total: ~$20/month for unlimited users (during beta)
Code Quality
- TypeScript errors: 0
- ESLint warnings: 0
- Test coverage: >85% (backend)
- Lighthouse score: 95+ (performance, accessibility, SEO)
🎯 Key Features Explained
1. Smart Transcript Fetching
Unlike basic YouTube transcript downloaders, TranscriptAI:
- Supports multiple languages
- Handles auto-generated vs manual transcripts
- Preserves accurate timestamps
- Caches results for instant retrieval
- Handles edge cases (private videos, age-restricted content)
2. AI-Powered Summaries
Generate three types of summaries:
Brief (2-3 sentences):
"This tutorial covers React hooks including useState and useEffect. The instructor demonstrates building a todo app with real-time updates. Key concepts include component lifecycle and state management patterns."
Detailed (multi-paragraph):
Comprehensive analysis with sections, context, and detailed explanations.
Key Points (bulleted list):
- useState manages local component state
- useEffect handles side effects and cleanup
- Custom hooks enable code reuse
- Context API prevents prop drilling
3. Content Extraction
Code Snippets:
// Automatically extracted with:
// - Language detection (TypeScript)
// - Syntax highlighting
// - Context explanation
// - Line number references
function useFetch<T>(url: string): { data: T | null, loading: boolean } {
const [data, setData] = useState<T | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
fetch(url).then(res => res.json()).then(setData).finally(() => setLoading(false));
}, [url]);
return { data, loading };
}Quotes:
"The key to understanding React is realizing that components are just functions that return UI."
- Speaker: Tutorial Instructor | Importance: High
Action Items:
- HIGH: Refactor authentication logic to use JWT tokens
- MEDIUM: Add error boundary to prevent app crashes
- LOW: Update documentation with new API endpoints
4. Intelligent Q&A
Ask questions about the video without watching it:
Question: "What is the main promise made in this song?"
Answer: The central promise is one of unwavering loyalty and commitment. The singer pledges to never give up on the relationship, never let the other person down, never run around and desert them, never make them cry, never say goodbye, and never tell a lie to hurt them.
Confidence: High
Sources:
"Never going to give you up. Never going to let you down..." "Never going to run around and desert you..."
🔒 Security Considerations
Security was a top priority throughout development:
1. Backend URL Protection
✅ Implemented: Server-side proxy layer hides backend infrastructure ✅ Verified: Browser DevTools shows zero backend requests ✅ Benefit: Professional API surface, enhanced security
2. Input Validation
✅ URL validation: Ensures only valid YouTube URLs accepted ✅ Language validation: ISO 639-1 code enforcement ✅ Request timeouts: 30-second limit prevents hanging ✅ SQL injection prevention: Parameterized queries only
3. Environment Variables
✅ Secrets management: API keys in CapRover env vars (not .env files) ✅ No hardcoded credentials: All sensitive data externalized ✅ Production validation: Strong password enforcement
4. Error Handling
✅ No stack traces exposed: User-friendly error messages only ✅ Detailed server logging: Errors logged for debugging ✅ Graceful degradation: App remains functional during API failures
5. CORS Configuration
✅ Restricted origins: Only allowed domains accepted ✅ Same-origin proxy requests: No CORS issues in production ✅ Secure headers: CSP, X-Frame-Options, etc.
🎨 UX/UI Highlights
Smooth Animations (60fps)
All animations respect
prefers-reduced-motion/* Custom Tailwind animations */
@keyframes fade-in {
from { opacity: 0; }
to { opacity: 1; }
}
@keyframes slide-in-right {
from { transform: translateX(100%); opacity: 0; }
to { transform: translateX(0); opacity: 1; }
}
@keyframes shimmer {
0% { background-position: -1000px 0; }
100% { background-position: 1000px 0; }
}Animated components:
- Loading spinners with smooth rotation
- Skeleton loaders with shimmer effect
- Toast notifications with slide-in
- Button ripple effects
- Card hover lift effects
- Modal scale-in entrance
Responsive Design
Breakpoints optimized for all devices:
- Mobile: 375px - 640px (optimized for iPhone, Android)
- Tablet: 640px - 1024px (iPad, Android tablets)
- Desktop: 1024px - 1920px (laptops, monitors)
- Large: 1920px+ (4K displays)
Layout scaling: 80% base font-size ensures consistent appearance across zoom levels (development at 80% zoom, production at 100% zoom).
Dark Mode
Full dark mode support with smooth transitions:
// Theme provider with system preference detection
<ThemeProvider
attribute="class"
defaultTheme="system"
enableSystem
disableTransitionOnChange={false}
>
{children}
</ThemeProvider>Colors optimized for:
- ✅ WCAG 2.1 AA contrast ratios
- ✅ Reduced eye strain in dark environments
- ✅ Consistent brand identity across themes
📈 Lessons Learned
1. Next.js 15 Breaking Changes
Challenge: Async params broke all dynamic routes Solution: Systematic migration to Promise-based params Takeaway: Always test major framework upgrades in staging first
2. AI Provider Selection
Challenge: GPT-4 costs were prohibitive at scale Solution: Multi-provider architecture, switched to Gemini Takeaway: Design for flexibility from day one (cost optimization later)
3. Database Indexing
Challenge: Queries became slow with 1000+ cached transcripts Solution: Strategic composite indexes on frequently queried columns Takeaway: Profile early, optimize often (10x speedup with proper indexes)
4. Browser Zoom Consistency
Challenge: Layout developed at 80% zoom looked different at 100% Solution: Set
html { font-size: 80%; }5. Error Message Security
Challenge: Stack traces leaking backend details Solution: Sanitize errors client-side, log details server-side Takeaway: Never trust error messages to be safe for users
6. TypeScript Configuration
Challenge: Next.js validator errors in .next/types Solution: Proper async/await patterns for Next.js 15 API routes Takeaway: Framework-specific patterns matter for type safety
🚀 Deployment Strategy
Infrastructure
Hosting: CapRover on Server Plus VPS
- Why CapRover: Self-hosted Heroku alternative, free (just VPS cost)
- Why Server Plus: Reliable, affordable, EU-based
Deployment Flow:
# 1. Build Docker images
docker build -t transcriptai-frontend:latest ./frontend
docker build -t transcriptai-backend:latest ./backend
# 2. Push to CapRover
./deploy-frontend.sh
./deploy-backend.sh
# 3. CapRover handles:
# - Container orchestration
# - SSL certificate renewal
# - Zero-downtime deployments
# - Health checks and auto-restartDatabase: PostgreSQL on same VPS
- Automated daily backups
- Replication for disaster recovery
- Connection pooling (25 max connections)
Reverse Proxy: Nginx
- SSL/TLS termination (Let's Encrypt)
- Gzip compression
- Static asset caching
- Rate limiting (1000 req/min)
CI/CD (Future Enhancement)
Planned GitHub Actions workflow:
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- Lint (ESLint, golangci-lint)
- Type check (TypeScript, Go)
- Unit tests (backend 85%+ coverage)
- Integration tests (API endpoints)
deploy:
needs: test
steps:
- Build Docker images
- Push to CapRover
- Run smoke tests
- Notify team (Slack/Discord)🌍 Production URLs
Primary: https://transcriptai.conn.digital Secondary: https://transcriptai.serverplus.org
Both URLs are live and fully functional, with the same features and performance.
📊 What's Next?
We're continuously improving TranscriptAI. Here's the roadmap:
Short-Term (Q4 2025)
- Multi-language transcript support (beyond English)
- Video chapter detection and navigation
- Browser extension (Chrome, Firefox)
- Batch processing (analyze multiple videos at once)
Mid-Term (Q1 2026)
- User accounts and history
- Custom AI prompt templates
- Transcript search across all saved videos
- API access for developers
- Webhook integrations (Zapier, Make)
Long-Term (Q2 2026+)
- Mobile apps (iOS, Android)
- Real-time transcript generation during live streams
- Collaboration features (teams, shared workspaces)
- Advanced analytics (topic clustering, trend analysis)
- Integration with note-taking apps (Notion, Obsidian)
💡 Why We Built This
As developers and content creators at Conn.Digital, we found ourselves constantly:
- Rewatching conference talks to find that one code snippet
- Copying quotes from podcasts for blog posts
- Trying to remember action items from client calls
- Wishing we could search inside video content like we search documents
We built TranscriptAI to solve our own problems. If it helps you too, that's the best outcome we could ask for.
🤝 Open Source Philosophy
While TranscriptAI is currently a proprietary project, I'm committed to:
- Sharing knowledge: Technical blog posts like this one
- Open discussions: Join our Discord for architecture talks
- Free beta access: No paywalls during development
- Future considerations: May open-source core components if there's community interest
Interested in the technical details? Have questions? Want to collaborate?
Reach out:
- Website: https://conn.digital
- Email: constantin@conn.digital
- LinkedIn: Connect with us and share your thoughts
- GitHub: Stay tuned for potential open-source releases
🎓 Technical Insights for Developers
Key Takeaways
- Proxy layers protect infrastructure: Never expose backend URLs to clients
- Multi-provider AI is smart: Avoid vendor lock-in, optimize costs
- Database indexing is critical: 10x performance gains with proper indexes
- TypeScript + Go = great combo: Type safety on both ends
- Accessibility matters: , WCAG compliance, etc.tsx
prefers-reduced-motion - Caching saves money: 95% cache hit rate = 95% fewer API calls
Code Snippets Worth Stealing
1. Next.js Server-Side Proxy Pattern:
// src/lib/server/backend-client.ts
const backendBaseUrl =
process.env.BACKEND_API_URL ??
process.env.NEXT_PUBLIC_BACKEND_URL ??
"http://localhost:8080";
export async function backendFetch(path: string, init?: RequestInit) {
const url = backendBaseUrl + path;
return fetch(url, { ...init, cache: "no-store" });
}2. Go AI Provider Interface:
type AIProvider interface {
GenerateSummary(ctx context.Context, transcript string, summaryType string) (*SummaryResult, error)
}
type OpenAIProvider struct { /* ... */ }
type GeminiProvider struct { /* ... */ }
func NewAIProvider() AIProvider {
provider := os.Getenv("AI_PROVIDER")
switch provider {
case "openai":
return &OpenAIProvider{}
case "google":
return &GeminiProvider{}
default:
return &GeminiProvider{}
}
}3. Database Query Optimization:
-- BEFORE (slow: ~15ms)
SELECT * FROM transcripts WHERE video_id = $1;
-- AFTER (fast: ~0.74ms)
CREATE INDEX idx_transcripts_video_id ON transcripts(video_id);
SELECT * FROM transcripts WHERE video_id = $1;
-- Now uses index scan instead of sequential scan🏆 Results
After 5 weeks of intensive development:
- ✅ Fully functional platform with 6 core features
- ✅ Production-ready infrastructure on CapRover
- ✅ Zero TypeScript/ESLint errors (clean codebase)
- ✅ High performance (1-2s average response time)
- ✅ Secure architecture (backend hidden, input validated)
- ✅ Accessible design (WCAG 2.1 AA compliant)
- ✅ Cost-effective (~$20/month operating cost)
- ✅ Scalable (ready for 10,000+ users)
🙏 Acknowledgments
Built with love by the Conn.Digital team.
Special thanks to:
- Next.js Team for an incredible framework
- Vercel for excellent documentation
- Shadcn for beautiful UI components
- Google for affordable AI (Gemini)
- CapRover for simplified deployment
- Open source community for inspiration and tools
📞 Get in Touch
Try TranscriptAI: https://transcriptai.conn.digital
Questions? Feedback? Collaboration opportunities?
- Website: https://conn.digital
- Email: constantin@conn.digital
- LinkedIn: Share your thoughts and connect
I'd love to hear from you!
