Microsoft AI (MAI) has unveiled the first public previews of two new AI models designed to enhance Copilot and related AI experiences. The company says these models aim to provide expressive, high-fidelity speech and advanced instruction-following capabilities, supporting a range of applications from podcasts to interactive storytelling. MAI-Voice-1 and MAI-1-preview mark the initial steps in Microsoft’s broader vision of creating purpose-built, consumer-focused AI systems.
MAI-Voice-1: Expressive speech generation
According to an official press release, the MAI-Voice-1 model is Microsoft’s first highly expressive speech generation system, capable of producing a full minute of audio in under a second on a single GPU. The company added the model is optimised for both single- and multi-speaker scenarios, offering high-fidelity audio for voice-driven AI interactions.
MAI-Voice-1 is already integrated into Microsoft’s Copilot Daily and Podcasts features. Additionally, it is available in the new Copilot Labs environment, where users can experiment with speech-based applications such as interactive “choose your own adventure” stories or guided meditations. The model is designed to demonstrate how speech can serve as a primary interface for AI companions, said Microsoft.
Also Read
MAI-1-preview: Instruction-following and model testing
The second model, MAI-1-preview, is a mixture-of-experts foundation model trained on approximately 15,000 NVIDIA H100 GPUs. Microsoft says the model has been pre-trained and post-trained to provide instruction-following capabilities for text-based queries. It is intended for use in Copilot text scenarios and for testing on LMArena, a community platform for model evaluation.
MAI-1-preview will initially be available to a limited set of users and trusted testers, with API access offered for early feedback. Microsoft plans to use these interactions to improve the model.
Future roadmap and computing infrastructure
Microsoft AI emphasises that these releases are early steps toward a broader vision of specialised models serving multiple user intents. The team has operationalised its next-generation GB200 cluster to support model development and deployment at scale. MAI says it will continue to leverage a mix of in-house models, partner technologies, and open-source innovations to deliver improvements across millions of interactions daily.

)