

Voxtral Transcribe 2 consists of two next-generation speech-to-text models with state-of-the-art transcription quality, diarization, and ultra-low latency. The family includes Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, providing precision diarization and real-time transcription capabilities.
Key features include speaker diarization that generates transcriptions with speaker labels and precise start/end times, context biasing that allows providing up to 100 words or phrases to guide the model toward correct spellings, and word-level timestamps enabling precise start and end timestamps for each word. The models support 13 languages including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch, with noise robustness that maintains accuracy in challenging acoustic environments and longer audio support processing recordings up to 3 hours in a single request.
Voxtral Realtime uses a novel streaming architecture that transcribes audio as it arrives, delivering transcriptions with delay configurable down to sub-200ms. Unlike approaches that adapt offline models by processing audio in chunks, Realtime's streaming architecture unlocks a new class of voice-first applications with latency configurable down to sub-200ms.
The product transforms voice workflows in diverse applications including meeting intelligence for transcribing multilingual recordings with speaker diarization, voice agents and virtual assistants for building conversational AI with sub-200ms transcription latency, contact center automation for transcribing calls in real time, media and broadcast for generating live multilingual subtitles, and compliance and documentation for monitoring and transcribing interactions for regulatory compliance.
Voxtral Transcribe 2 targets developers building live applications, voice agents, and meeting transcription systems, with both models supporting GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups. The models integrate with LLM and TTS pipelines for responsive voice interfaces and can be deployed on edge devices for privacy-first applications.
admin
Voxtral Transcribe 2 is designed for developers building live applications, voice agents, and meeting transcription systems. It serves enterprises requiring contact center automation, media and broadcast companies needing live subtitling, and organizations with compliance documentation needs. The product targets teams working on conversational AI, virtual assistants, and voice-first applications across various industries including tech, customer service, media, and regulated sectors.