The Video Engine.
An AI football documentary engine. Give it a title and a director's brief, and 15 chained agents produce the full package: research, blueprint, storyboard, narrated script, animated graphics, a clip sourcing sheet, and a timeline editor to ship from.
Overview.
"The engine takes a single title, such as Why Brazil Stopped Producing Playmakers and a director's brief, and outputs a 10–15 minute football documentary that feels directed, continuous, and visually authored rather than templated together."
The Python side decides what to make. Fifteen agents run in sequence behind a Flask UI: entity & research agents pull background from Wikipedia and Google News, an analysis agent shapes a director's brief, then script, narration, graphics, player_image, motion, music_selector, and production agents build out the storyboard, score, and clip sourcing list, all coordinated by orchestrator.py.
The Remotion side decides how it looks. A separate React project renders 30+ animated graphic templates (stat bars, radars, lineups, tactical boards, transfer records, league tables) driven by structured props the engine emits. ElevenLabs handles narration TTS; a centralised WorldStateRoot lets graphics share an infinite spatial canvas so consecutive scenes feel connected instead of resetting. The Flask UI then exposes a 5-step pipeline (title → context → blueprint → storyboard → render) plus a studio grid and a 4-track timeline editor for review and export.
Real renders.
From the engineFrom title to export.
The pipeline is a 5-step UI on top of a deterministic agent chain. Each step lands real artifacts on disk under output/<safe_name>/ so a run can be paused, reviewed, and resumed.
Title + director's brief
User enters a title; entity_agent extracts the subject, research_agent pulls Wikipedia + Google News, and an LLM drafts the director's brief.
Editable fact checklist
The brief is parsed into structured facts the user can tick on or off. Checked items become MUST INCLUDE constraints downstream, written to context.md + facts.md.
Act-by-act structure
An LLM lays out the 5-act blueprint with required facts injected as constraints. The user adjusts emphasis and act count before storyboard generation runs.
70–90 scenes, drag & drop
The script agent expands the blueprint into a full storyboard. A drag-drop editor lets the user reorder, splice, and approve before the agent chain runs end-to-end via orchestrator.py.
Script · narration · graphics · clips
Sequential pipeline: script_agent → narration_agent (ElevenLabs TTS) → graphics_agent (Remotion renders) → production_agent (clip sourcing sheet).
Review grid & 4-track timeline
Two interfaces: a render grid for approve / reject / re-render, and a 4-track timeline editor for splicing graphics, narration, music, and sourced clips before final export.
Architecture.
The system has two halves. The Python engine (Flask + Groq + ElevenLabs) decides what to make: research, script, narration, clip sourcing. The Remotion project decides how it looks: 30+ React graphic templates rendered to MP4.
A central WorldStateRoot in VideoSequence.tsx keeps consecutive graphics on a shared spatial canvas (cameraX = 0 → 1920 → 3840…), so a sequence of stat cards reads as one continuous camera move rather than 8 hard cuts. _break_data_runs in the orchestrator reorders scenes to avoid 3+ consecutive pure-stat blocks.
30+ templates rendered through a shared WorldStateRoot with per-scene cameraX offsets, so consecutive graphics read as one continuous camera move.