microsoft has released three foundational AI models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. MAI-Transcribe-1 transcribes speech in 25 languages and is 2.5 times faster than previous offerings. MAI-Voice-1 generates audio quickly and allows for custom voice creation, while MAI-Image-2 focuses on video generation.
for indie developers, these models could provide more affordable options for integrating AI into projects. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 is priced at $22 per million characters, and MAI-Image-2 costs $5 for text input and $33 for image output per million tokens. this pricing may help developers manage costs while leveraging advanced AI capabilities.
developers should consider testing these models in their workflows, especially if they need transcription or voice generation. using MAI Playground could be a good starting point for experimentation with these new tools.
keep an eye on updates from microsoft as they plan to release more models in the future, which may further enhance your development options.