AI Voiceover · Local Kokoro TTS · No API Costs
Add Voiceover
to Your Video
Type a script, pick from 14 voices, get a narrated MP4. Auto-ducks under background music so your narration sits cleanly on top. Local TTS — your script never leaves our server.
Unlimited voiceovers · Cancel anytime
How it works
Drop your video
.mp4 · .mov · .webm
→
Type your script
Up to 500 characters · 14 voices
→
Download narrated MP4
Music auto-ducks under speech
Why use this
Studio-quality narration without the studio bill
14 voices, all English
5 American female (Bella, Nicole, Sarah, Sky, Kore), 4 American male (Adam, Michael, Onyx, Eric), 3 British female (Alice, Emma, Lily), 2 British male (Daniel, George). Pick the one that fits your brand.
Auto-ducked music
When voiceover and background music are both present, FFmpeg's sidechaincompress drops the music ~8 dB during speech and lifts it back up between phrases. Sounds like a podcast or radio ad — voice on top, music underneath.
Pull script from video
If your video has on-screen text, click "Pull script from video" — Tesseract OCRs the visible text and pre-fills the script box. Edit before rendering. Useful for re-narrating existing kinetic-typography content.
Your script stays private
Kokoro TTS runs entirely on our server. Your script never goes to OpenAI, ElevenLabs, Google, Azure, or any other voice service. No tracking, no profiling, no data sharing.
No per-character bill
$5/month flat covers unlimited voiceovers. ElevenLabs Starter is $22/mo + per-character. OpenAI TTS is $15/M characters. We're flat-rate forever.
Fast generation
Kokoro runs at ~1.2× realtime on our CPU box. A 200-char script (~20 seconds of speech) generates in ~25 seconds. Render overhead is typically under a minute total.
FAQ
Adding Voiceover to Video — Common Questions
How does the voiceover feature work?
Type a script (up to 500 characters) and pick one of 14 voices. We synthesize the audio using Kokoro v1.0 — an open-source TTS model based on StyleTTS2 — running locally on our server. The generated WAV is mixed into your video alongside any background music you've added.
What voices are available?
14 voices total: 5 American female (Bella, Nicole, Sarah, Sky, Kore), 4 American male (Adam, Michael, Onyx, Eric), 3 British female (Alice, Emma, Lily), and 2 British male (Daniel, George). All English. Quality is comparable to ElevenLabs for short narration scripts.
Does this cost extra per character?
No. Kokoro TTS runs entirely on our server — no third-party API calls, no per-character billing. Your $5/month subscription covers unlimited voiceovers.
Can I record my own voice instead of using AI?
Yes — upload your own audio file in the music panel. The mixing pipeline treats it like any other audio track, so it'll play alongside your video.
What if I don't know what to say?
If your input is a video with text on screen (kinetic-typography animation, marketing video with overlays), click the "Pull script from video" button. Tesseract OCRs the on-screen text and pre-fills the script field for you. Edit before rendering if needed.
Will the music drown out the voiceover?
No — when both are present, music is automatically ducked under speech via FFmpeg's sidechaincompress filter. Music drops 8 dB whenever the voice signal exceeds threshold and lifts back up between phrases.
What input formats are supported?
MP4, MOV, M4V, and WebM video files. You can also voiceover HTML animations (Claude Design exports, Lottie, GSAP, CSS) — perfect for narrating a silent kinetic-typography piece.
How long does voiceover generation take?
Kokoro runs at roughly 1.2× realtime on our 2-core CPU box. So a 200-character script (~20 seconds of speech) generates in ~25 seconds. Sub-2-minute total render overhead is typical.
How much does it cost?
$5 per month, unlimited renders, unlimited voiceover characters. Cancel anytime.
Also available