OmniVoice Studio

A tiernew this week

A fully local, open-source desktop app for voice cloning, video dubbing, and real-time dictation — zero cloud, zero subscription, 646 languages.

Open OmniVoice Studio →Compare with alternatives

Kai's verdict

The feature surface is genuinely impressive for a solo open-source project — 646-language TTS, video dubbing, MCP integration, and a real desktop UI — but active beta status and fuzzy commercial licensing mean you should treat it as a powerful experiment, not a production workhorse yet. (Verdict pending Phi's full review.)

Strengths

Completely local processing — no API keys, no cloud account, no data sent to third parties
646 languages for TTS (vs. ElevenLabs' 32) and 99 languages for transcription via WhisperX
Zero-shot voice cloning from a 3-second clip, plus parametric voice design without any reference audio
Full video dubbing pipeline (transcribe → translate → re-voice → MP4) with Demucs background audio preservation
Built-in MCP Server exposes capabilities to Claude, Cursor, or any MCP client for automation

Weaknesses

Active beta — things break between releases; macOS builds aren't notarized yet, requiring a manual Gatekeeper workaround
Meaningful hardware requirements: CPU-only runs work but TTS is ~3× slower; 16 GB RAM and 8+ GB VRAM recommended for production use
Commercial licensing terms aren't finalized, creating uncertainty for business use cases

Best for

Privacy-conscious creators, developers, and localizers who want ElevenLabs-tier voice features without the cloud dependency, usage limits, or monthly bill.

Pricing

Free (FSL license — personal/non-commercial); commercial pricing TBD

Released under Functional Source License (FSL-1.1-ALv2): free for personal, educational, and non-commercial use. Each release converts to Apache 2.0 two years after publication. Commercial licensing pricing is being finalized — quote requests via in-app page.

Alternatives worth knowing

ElevenLabs

The voice gold standard. Cloning + TTS + dubbing.

Cartesia

Ultra-low-latency voice. Built for realtime agents.

Play.ht

Enterprise-grade TTS with voice cloning.

Hume AI

Voice AI that reads + expresses emotion.

OpenAI Voice / Realtime

ChatGPT's voice + the Realtime API for developers.

Descript

Edit video + podcasts by editing the transcript.