Voxtral
last releaseMarch 23, 2026
powered byVoxtral Small, Voxtral Mini, Voxtral Transcribe 2, Voxtral Realtime, Voxtral TTS
goblin vibe check:
solid pick if you need voice commands or audio q&a and you're already comfortable running mistral models locally
mistral's open speech understanding and voice model family for transcription, audio reasoning, live voice agents, and tts. useful when you want speech workflows that can run through an api or locally with open weights instead of being locked inside a closed voice stack.
context
32k
tokens
speed
sub-200ms
cost
$0.003
per min
Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestamps
key features
Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestampsopen speech understanding models for transcription, audio reasoning, and voice-agent workflows
spec & usage
Processes up to 30 minutes for transcription or 40 minutes for summarization and Q&A
Built on the Mistral Small 3.1 backbone with a causal audio encoder and transformer decoder
Apache 2.0 release that can run locally on consumer GPUs like an RTX 4090
Context biasing helps it lock onto niche technical terms and proper nouns
limitations
the family spans transcription, realtime, and tts models, so model choice matters
local deployment still requires enough hardware and integration work for real-time use
scope:
audiolanguagetoolvoicesearchagentapilocalopen-sourcereal-time
launchJuly 15, 2025
last releaseMarch 23, 2026