Microsoft AI (MAI) has unveiled the first public previews of two new AI models designed to enhance Copilot and related AI experiences. The company says these models aim to provide expressive, high-fidelity speech and advanced instruction-following capabilities, supporting a range of applications from podcasts to interactive storytelling. MAI-Voice-1 and MAI-1-preview mark the initial steps in Microsoft’s broader vision of creating purpose-built, consumer-focused AI systems.

MAI-Voice-1: Expressive speech generation

According to an official press release, the MAI-Voice-1 model is Microsoft’s first highly expressive speech generation system, capable of producing a full minute of audio in under a second on a single GPU. The company added the model is optimised for both single- and multi-speaker scenarios, offering high-fidelity audio for voice-driven AI interactions.

MAI-Voice-1 is already integrated into Microsoft’s Copilot Daily and Podcasts features. Additionally, it is available in the new Copilot Labs environment, where users can experiment with speech-based applications such as interactive “choose your own adventure” stories or guided meditations. The model is designed to demonstrate how speech can serve as a primary interface for AI companions, said Microsoft. ALSO READ: Galaxy Watch 8 series review: Nails it, if you have a Samsung smartphone MAI-1-preview: Instruction-following and model testing The second model, MAI-1-preview, is a mixture-of-experts foundation model trained on approximately 15,000 NVIDIA H100 GPUs. Microsoft says the model has been pre-trained and post-trained to provide instruction-following capabilities for text-based queries. It is intended for use in Copilot text scenarios and for testing on LMArena, a community platform for model evaluation.