Voxtral

last releaseMarch 23, 2026

powered byVoxtral Small, Voxtral Mini, Voxtral Transcribe 2, Voxtral Realtime, Voxtral TTS

goblin vibe check:

solid pick if you need voice commands or audio q&a and you're already comfortable running mistral models locally

mistral's open speech understanding and voice model family for transcription, audio reasoning, live voice agents, and tts. useful when you want speech workflows that can run through an api or locally with open weights instead of being locked inside a closed voice stack.

context

32k

tokens

speed

sub-200ms

cost

$0.003

per min

Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestamps

key features

Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestampsopen speech understanding models for transcription, audio reasoning, and voice-agent workflows

spec & usage

Processes up to 30 minutes for transcription or 40 minutes for summarization and Q&A

Built on the Mistral Small 3.1 backbone with a causal audio encoder and transformer decoder

Apache 2.0 release that can run locally on consumer GPUs like an RTX 4090

Context biasing helps it lock onto niche technical terms and proper nouns

limitations

the family spans transcription, realtime, and tts models, so model choice matters

local deployment still requires enough hardware and integration work for real-time use

scope:

audiolanguagetoolvoicesearchagentapilocalopen-sourcereal-time

launchJuly 15, 2025

last releaseMarch 23, 2026

visit site github x