Back to home
AI / Full-Stack·December 2025 – Present·Built at Tessact (6-engineer team, $2M funded)

AI Video Repurposing Platform

Turns a 2-hour podcast into 30–40 social-ready clips in 1 hour

Built as the sole engineer on a 6-person team. Owned the entire product: system architecture, AI/LLM integration, full-stack development, ML service deployment, and production scaling. The CEO handled product requirements; I built everything else.

Tessact
Editing time
4 weeks → 1 hour
Cost per podcast
$1,000 → $5
Cost advantage
200×
Output quality
95% ready-to-post
Hours processed
500+ since Jan 2026
Clips per podcast
30–40

Overview

Tessact needed a flagship product to drive its February 2026 public launch. Manual podcast editing takes 4 weeks and costs $1,000+ per episode. I designed and shipped an AI platform that replaces that workflow — it takes a 2-hour podcast and produces 30–40 branded short-form clips in under 1 hour at $5 total. Since closed beta (January 31, 2026), it has processed 500+ hours of content for enterprise clients.


System Architecture

  • Designed a microservices system with four independently deployable services: TessactAI (FastAPI + PydanticAI) for LLM orchestration, Core Backend (Django + Celery/RabbitMQ) for job orchestration, a React/Next.js frontend for uploads and previews, and a Remotion rendering service on AWS Lambda.
  • Built a custom DAG-style job pipeline using Celery and RabbitMQ. 11 pipeline stages — transcription, speaker diarization, face tracking, speaker name resolution, clip selection, enhancement generation, face cropping, reframing, rendering, export — run with smart dependency management so independent jobs run in parallel.
  • Each service is containerized with Docker and deployed to GCP Cloud Run (face tracking, AI service) or AWS Lambda (Remotion rendering). Auto-scales based on load.
  • Chose microservices over a monolith to enable GPU-based face tracking to scale independently from the standard-compute LLM and rendering services.

LLM Orchestration

  • Built PydanticAI-based orchestration for three AI-driven features: (1) Clip Selection — LLM reads the full podcast transcript and identifies 30–40 self-contained, social-media-ready segments with optimal start/end boundaries. (2) Speaker Name Resolution — LLM identifies speaker names from conversational context (introductions, greetings) and maps generic labels ('Speaker 1' → 'John Doe') with confidence scoring. (3) Enhancement Generation — LLM selects and times visual effects (intro cards, speaker IDs, quote pops, chapter titles, topic tags, image cards) based on transcript analysis and brand guidelines.
  • A/B tested OpenAI GPT-4/5 vs Google Gemini 2.5 Flash across 100+ real podcast samples. Gemini matched GPT output quality while costing 60% less. Switched permanently to Gemini.
  • Engineered structured prompts with typed output schemas, effect catalogs, timing constraints, and confidence scoring. Added validation layers to catch hallucinations before they hit the rendering pipeline.

Computer Vision & Face Tracking

  • Built face tracking on top of InsightFace (SCRFD-10G detector, GLINTR100 embeddings). Combines frame-to-frame tracking with ReID for stable speaker identity across angle changes, lighting shifts, and occlusion.
  • Implemented talk score computation to detect when a speaker is actively speaking (lip movement + audio correlation), enabling accurate speaker-aware reframing for multi-speaker podcasts.
  • Added protected shot detection: identifies frames with large-text overlays or graphics to prevent incorrect reframing.
  • Engineered a key optimization: run face tracking only on the final clip time ranges after LLM clip selection, not the full 2-hour video. This reduced face tracking compute by ~80% and cut processing costs proportionally.
  • Built cinematic scope detection to automatically remove black bars from letterboxed content before reframing.

Cost Engineering

  • Reduced transcription costs 72%: benchmarked AWS Transcribe ($1.44/hr, ~15% WER) vs ElevenLabs Scribe v2 ($0.40/hr, ~10% WER for English and Indian languages). ElevenLabs was both cheaper and more accurate for the target content.
  • LLM cost reduced 60% by switching from OpenAI GPT-5 to Google Gemini 2.5 Flash after head-to-head quality testing.
  • Remotion rendering on AWS Lambda with parallel execution: each 30-second clip renders in ~30 seconds. Scales horizontally with zero idle cost.
  • Final platform cost: $5/hour of processed podcast vs $1,000+ for equivalent manual editing agency work.

DevOps & Engineering Practices

  • CI/CD pipeline with GitHub Actions: automated Ruff linting, pytest backend tests, Jest frontend tests, and deployment to GCP Cloud Run and AWS Lambda on merge to main.
  • Pre-commit hooks enforce code quality (Ruff, type checks) before every commit.
  • Error monitoring with Sentry — 97%+ crash-free sessions across frontend and backend. Alerts on job failure, latency spikes, and LLM error rates.
  • Used Claude Code for PR reviews and GitHub Copilot for accelerated development across the 2-month solo build.

Business Impact

  • Processed 500+ hours of podcast content since closed beta launch on January 31, 2026.
  • Drives Tessact's February 2026 public launch as the sole flagship feature, priced at $20/month (50–60 clips per month).
  • Enabled enterprise POC pipeline: supporting active POCs with US and European brands for custom brand kit generation (LLM-generated Remotion components that match client brand guidelines).
  • Built real-time highlight generation POC for Garena Free Fire livestreams: ingests live stream chunks and produces gaming highlights (headshots, kills, clutches) as instant short-form content.

Tech Stack

Python
TypeScript
FastAPI
Django
PydanticAI
React
Next.js 14
Remotion
OpenAI GPT-5
Google Gemini 2.5 Flash
InsightFace
ElevenLabs Scribe v2
Whisper
TransNet-V2
Docker
GCP Cloud Run
AWS Lambda
Celery
RabbitMQ
PostgreSQL
Redis
GitHub Actions
Sentry
FFmpeg
NVENC

All projectsTessact