Inworld Realtime TTS-2

last releaseMay 5, 2026

powered byinworld-tts-2

goblin vibe check:

the one to watch if your characters need to sound alive instead of reading lines off a spreadsheet

realtime conversational voice model that conditions on prior audio context, voice direction, and user state. useful for game characters, companions, support agents, and multilingual voice interfaces that need more than static narration.

speed

<200ms

first chunk

conditions on the prior audio of the exchange rather than isolated text-only synthesistakes natural-language voice direction inline like a promptpreserves one voice identity across over 100 languagesavailable through both the inworld api and the inworld realtime api

key features

spec & usage

official research preview post is dated may 5, 2026

docs claim sub-200ms first-chunk latency for streaming

customers on realtime tts 1.5 upgrade by changing the model identifier

limitations

launched as a research preview rather than a fully settled ga product

long-tail language support is described as launch-window experimental

scope:

audiomodelvoiceapicloudpaidreal-timemultimodal

launchMay 5, 2026

last releaseMay 5, 2026

visit site