AI / Full-Stack·December 2025 – Present·Built at Tessact (6-engineer team, $2M funded)

AI Video Repurposing Platform

Turns a 2-hour podcast into 30–40 social-ready clips in 1 hour

Built as the sole engineer on a 6-person team. Owned the entire product: system architecture, AI/LLM integration, full-stack development, ML service deployment, and production scaling. The CEO handled product requirements; I built everything else.

Tessact

Editing time

4 weeks → 1 hour

Cost per podcast

$1,000 → $5

Cost advantage

200×

Output quality

95% ready-to-post

Hours processed

500+ since Jan 2026

Clips per podcast

30–40

Overview

Tessact needed a flagship product to drive its February 2026 public launch. Manual podcast editing takes 4 weeks and costs $1,000+ per episode. I designed and shipped an AI platform that replaces that workflow — it takes a 2-hour podcast and produces 30–40 branded short-form clips in under 1 hour at $5 total. Since closed beta (January 31, 2026), it has processed 500+ hours of content for enterprise clients.

System Architecture

Designed a microservices system with four independently deployable services: TessactAI (FastAPI + PydanticAI) for LLM orchestration, Core Backend (Django + Celery/RabbitMQ) for job orchestration, a React/Next.js frontend for uploads and previews, and a Remotion rendering service on AWS Lambda.
Built a custom DAG-style job pipeline using Celery and RabbitMQ. 11 pipeline stages — transcription, speaker diarization, face tracking, speaker name resolution, clip selection, enhancement generation, face cropping, reframing, rendering, export — run with smart dependency management so independent jobs run in parallel.
Each service is containerized with Docker and deployed to GCP Cloud Run (face tracking, AI service) or AWS Lambda (Remotion rendering). Auto-scales based on load.
Chose microservices over a monolith to enable GPU-based face tracking to scale independently from the standard-compute LLM and rendering services.

LLM Orchestration

Built PydanticAI-based orchestration for three AI-driven features: (1) Clip Selection — LLM reads the full podcast transcript and identifies 30–40 self-contained, social-media-ready segments with optimal start/end boundaries. (2) Speaker Name Resolution — LLM identifies speaker names from conversational context (introductions, greetings) and maps generic labels ('Speaker 1' → 'John Doe') with confidence scoring. (3) Enhancement Generation — LLM selects and times visual effects (intro cards, speaker IDs, quote pops, chapter titles, topic tags, image cards) based on transcript analysis and brand guidelines.
A/B tested OpenAI GPT-4/5 vs Google Gemini 2.5 Flash across 100+ real podcast samples. Gemini matched GPT output quality while costing 60% less. Switched permanently to Gemini.
Engineered structured prompts with typed output schemas, effect catalogs, timing constraints, and confidence scoring. Added validation layers to catch hallucinations before they hit the rendering pipeline.

Computer Vision & Face Tracking

Built face tracking on top of InsightFace (SCRFD-10G detector, GLINTR100 embeddings). Combines frame-to-frame tracking with ReID for stable speaker identity across angle changes, lighting shifts, and occlusion.
Implemented talk score computation to detect when a speaker is actively speaking (lip movement + audio correlation), enabling accurate speaker-aware reframing for multi-speaker podcasts.
Added protected shot detection: identifies frames with large-text overlays or graphics to prevent incorrect reframing.
Engineered a key optimization: run face tracking only on the final clip time ranges after LLM clip selection, not the full 2-hour video. This reduced face tracking compute by ~80% and cut processing costs proportionally.
Built cinematic scope detection to automatically remove black bars from letterboxed content before reframing.

Cost Engineering

Reduced transcription costs 72%: benchmarked AWS Transcribe ($1.44/hr, ~15% WER) vs ElevenLabs Scribe v2 ($0.40/hr, ~10% WER for English and Indian languages). ElevenLabs was both cheaper and more accurate for the target content.
LLM cost reduced 60% by switching from OpenAI GPT-5 to Google Gemini 2.5 Flash after head-to-head quality testing.
Remotion rendering on AWS Lambda with parallel execution: each 30-second clip renders in ~30 seconds. Scales horizontally with zero idle cost.
Final platform cost: $5/hour of processed podcast vs $1,000+ for equivalent manual editing agency work.

DevOps & Engineering Practices

CI/CD pipeline with GitHub Actions: automated Ruff linting, pytest backend tests, Jest frontend tests, and deployment to GCP Cloud Run and AWS Lambda on merge to main.
Pre-commit hooks enforce code quality (Ruff, type checks) before every commit.
Error monitoring with Sentry — 97%+ crash-free sessions across frontend and backend. Alerts on job failure, latency spikes, and LLM error rates.
Used Claude Code for PR reviews and GitHub Copilot for accelerated development across the 2-month solo build.

Business Impact

Processed 500+ hours of podcast content since closed beta launch on January 31, 2026.
Drives Tessact's February 2026 public launch as the sole flagship feature, priced at $20/month (50–60 clips per month).
Enabled enterprise POC pipeline: supporting active POCs with US and European brands for custom brand kit generation (LLM-generated Remotion components that match client brand guidelines).
Built real-time highlight generation POC for Garena Free Fire livestreams: ingests live stream chunks and produces gaming highlights (headshots, kills, clutches) as instant short-form content.

Tech Stack

Python

TypeScript

FastAPI

Django

PydanticAI

React

Next.js 14

Remotion

OpenAI GPT-5

Google Gemini 2.5 Flash

InsightFace

ElevenLabs Scribe v2

Whisper

TransNet-V2

Docker

GCP Cloud Run

AWS Lambda

Celery

RabbitMQ

PostgreSQL

Redis

GitHub Actions

Sentry

FFmpeg

NVENC

All projects Tessact