Home / Technology / Tech News / Microsoft introduces MAI-Transcribe-1, Voice-1, Image-2 AI models: Details

Microsoft introduces MAI-Transcribe-1, Voice-1, Image-2 AI models: Details

Microsoft's MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2 models are now live, offering speed improvements, multi-language support and competitive pricing for developers and enterprises

MAI-Transcribe-1, Voice-1, Image-2 AI models (Image: Microsoft)

2 min read Last Updated : Apr 03 2026 | 4:14 PM IST

Listen to This Article

Microsoft has announced a new set of AI models, including MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2, aimed at improving speech, voice and image generation capabilities. According to the company, these models are now available through Microsoft Foundry and the MAI Playground (US-exclusive), with a focus on faster performance, efficiency and competitive pricing. The rollout brings upgrades across transcription accuracy, voice generation and image creation, with Microsoft also integrating these capabilities into its own products.

Transcription, voice and image models

MAI-Transcribe-1: Microsoft said that this is designed for speech-to-text tasks and supports transcription across the top 25 most-used languages, based on the FLEURS benchmark. The company said that the model is built to handle real-world audio conditions and delivers batch transcription speeds that are 2.5 times faster than its existing Azure Fast offering.

MAI-Voice-1: This model focuses on voice generation, producing speech with natural tone, emotional range and consistency across longer content. Microsoft has also added support for creating custom voices using a short audio sample. The model can generate up to 60 seconds of audio in one second, with the company highlighting efficient GPU usage for cost-effective performance.

ALSO READ: CMF Watch app to get delisted, here's how you can migrate to Nothing X app

Also Read

CMF Watch users to transfer app to Nothing X app

CMF Watch app to get delisted, here's how you can migrate to Nothing X app

Apple has 4 devices in pipeline waiting for AI Siri upgrade: What to expect

Google releases security update following Chrome zero-day exploit: Report

Google Meet comes to CarPlay with audio-only meeting support: What's new

ChatGPT is now available in Apple CarPlay

Here is how you can set up and use ChatGPT on Apple CarPlay: Check steps

MAI-Image-2: As per Microsoft, it offers at least twice the generation speed compared to earlier systems on Foundry and Copilot, based on production data. Microsoft said the model is designed to deliver realistic lighting, accurate skin tones and clear text rendering for visual content. It is also being rolled out in phases across services such as Bing and PowerPoint.

Availability and pricing

Microsoft said all three models are available starting today on Microsoft Foundry, with MAI Playground access currently limited to users in the US. The company has positioned the models as offering competitive price-to-performance across cloud providers.

Pricing starts at $0.36 per hour for MAI-Transcribe-1, $22 per one million characters for MAI-Voice-1, and $5 per one million tokens for text input and $33 per one million tokens for image output with MAI-Image-2.

ALSO READ: Google releases security update following Chrome zero-day exploit: Report

Microsoft added that these models are also being used within its own products and are available for developers to build applications and services.