updateApr 2, 2026ยท 1 min read

microsoft releases three new foundational models

microsoft has launched three new AI models for text, voice, and image generation. these models may offer cost-effective alternatives for indie developers.

microsoft has released three foundational AI models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. MAI-Transcribe-1 transcribes speech in 25 languages and is 2.5 times faster than previous offerings. MAI-Voice-1 generates audio quickly and allows for custom voice creation, while MAI-Image-2 focuses on video generation.

for indie developers, these models could provide more affordable options for integrating AI into projects. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 is priced at $22 per million characters, and MAI-Image-2 costs $5 for text input and $33 for image output per million tokens. this pricing may help developers manage costs while leveraging advanced AI capabilities.

developers should consider testing these models in their workflows, especially if they need transcription or voice generation. using MAI Playground could be a good starting point for experimentation with these new tools.

keep an eye on updates from microsoft as they plan to release more models in the future, which may further enhance your development options.

vibe check
microsoft dropped three new models because apparently we didn't have enough options already. the indie dev choice paralysis continues but hey at least these ones might not require a second mortgage