The launch of Maya1, a new model of converting text into speech (called “text to speech”, or TTS) based on artificial intelligence (AI), indicates a paradigm shift in such services. The model has been built by two 23-year-olds in Bengaluru and it is ranked second among open-weight voice AIs (where the user can tweak the AI model’s weightings) and 20th globally on quality benchmarks. The model displays a few technical breakthroughs and it was developed on a shoestring. Most TTS models, like the ones from Google, ElevenLabs, or OpenAI, rely on libraries of recorded voices. Maya1 allows users to design voices to custom specification by using natural-language prompts such as “calm, elderly male schoolteacher with an American accent” rather than relying on fixed-voice libraries.

It supports over 20 controllable speaking styles, including natural patterns like hesitation, excitement, and warmth. Maya1’s dataset offers users the option to insert over 20 emotion tags such as laugh, sigh, whisper, anger, and giggle. The model changes speech patterns accordingly. All this can be composed, allowing a switch of tone mid-sentence and a natural, emotional speaking style. Equally important, Maya1 does this without discernible lags, scanning text and speaking with less than 100ms (millisecond) latency. This makes it indistinguishable from human speech since the TTS model reads text at the same speeds as educated humans. This versatility makes it ideal for a wide range of use cases. When it comes to podcasts, audiobooks, and video content, Maya1 can narrate long-form content with an emotional range, using different voices for different personalities. It can work similarly for video-game characters with emotional delivery. It can also be used as an AI voice assistant for accessibility tasks to aid users who need visual assistance, and for customer services since it is low-latency and offers responsive interaction. Maya1 would be described technically as a three-billion-parameter decoder-only transformer, finetuned from a Llama base. It’s available under the Apache 2.0 licence, and it’s free to download, tweak, and deploy for commercial use. Given the moderate hardware requirements, it can run locally on a device with a single graphic-processing unit so there is no Cloud dependency, allowing it to be easily deployed in rural and low-bandwidth settings.