Microsoft introduces MAI-Transcribe-1, Voice-1, Image-2 AI models: Details
Microsoft's MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2 models are now live, offering speed improvements, multi-language support and competitive pricing for developers and enterprises
)
MAI-Transcribe-1, Voice-1, Image-2 AI models (Image: Microsoft)
Listen to This Article
Microsoft has announced a new set of AI models, including MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2, aimed at improving speech, voice and image generation capabilities. According to the company, these models are now available through Microsoft Foundry and the MAI Playground (US-exclusive), with a focus on faster performance, efficiency and competitive pricing. The rollout brings upgrades across transcription accuracy, voice generation and image creation, with Microsoft also integrating these capabilities into its own products.
Transcription, voice and image models
MAI-Transcribe-1: Microsoft said that this is designed for speech-to-text tasks and supports transcription across the top 25 most-used languages, based on the FLEURS benchmark. The company said that the model is built to handle real-world audio conditions and delivers batch transcription speeds that are 2.5 times faster than its existing Azure Fast offering.
MAI-Voice-1: This model focuses on voice generation, producing speech with natural tone, emotional range and consistency across longer content. Microsoft has also added support for creating custom voices using a short audio sample. The model can generate up to 60 seconds of audio in one second, with the company highlighting efficient GPU usage for cost-effective performance.
Also Read
MAI-Image-2: As per Microsoft, it offers at least twice the generation speed compared to earlier systems on Foundry and Copilot, based on production data. Microsoft said the model is designed to deliver realistic lighting, accurate skin tones and clear text rendering for visual content. It is also being rolled out in phases across services such as Bing and PowerPoint.
Availability and pricing
Microsoft said all three models are available starting today on Microsoft Foundry, with MAI Playground access currently limited to users in the US. The company has positioned the models as offering competitive price-to-performance across cloud providers.
Pricing starts at $0.36 per hour for MAI-Transcribe-1, $22 per one million characters for MAI-Voice-1, and $5 per one million tokens for text input and $33 per one million tokens for image output with MAI-Image-2.
Microsoft added that these models are also being used within its own products and are available for developers to build applications and services.
More From This Section
Topics : Microsoft AI Models Latest Technology News
Don't miss the most important news and views of the day. Get them on our Telegram channel
First Published: Apr 03 2026 | 4:14 PM IST
