Speechmatics

last releaseOctober 11, 2024

powered byUrsa 2

goblin vibe check:

worth a look if you need speech-to-text and text-to-speech from one invoice and accuracy actually matters to you

enterprise speech stack powered by proprietary ursa engines trained with self-supervised learning across broad accents, dialects, and domains.

speed

<1s

realtime

18% lower WER than original Ursa across 50+ languagesReal-time transcription under one secondMedical model tuned on clinical language

key features

18% lower WER than original Ursa across 50+ languagesReal-time transcription under one secondMedical model tuned on clinical language

spec & usage

Uses self-supervised learning and transformer-based acoustic models trained on millions of hours of unlabeled audio

Scales to roughly 2B parameters with NVIDIA-accelerated inference for high-throughput transcription

Maps acoustic representations into phoneme probabilities before an LLM handles sequence prediction

scope:

audiovoiceapicloudpaidreal-timebenchmark-strong

launchlate November 2024

last releaseOctober 11, 2024