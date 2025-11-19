India’s first homegrown foundational large language model (LLM) built by Sarvam AI may come out early next year, the company’s co-founder Vivek Raghavan said on Wednesday.

The launch will, in all likelihood, happen before or during the India AI Impact Summit, as the government holds the flagship event to demonstrate the country’s capabilities and advancements in AI, and more specifically around sovereign models.

“We are trying to get the model out by February,” Raghavan told Business Standard on the sidelines of the Bengaluru Tech Summit.

Sarvam AI was selected by the India AI Mission this year to build the country’s first sovereign LLM ecosystem, developing an open-source 120-billion-parameter AI model to enhance governance and public service access through use cases like 2047: Citizen Connect and AI4Pragati.

ALSO READ: SC pulls up Sebi, CBI and MCA for 'passive approach' in probe into IHFL In a panel discussion, Raghavan said, “The existing models have sub 1 per cent Indian data.” Sarvam’s LLM will have more than 17 trillion tokens with 17–20 per cent coming from Indian data. Besides Sarvam, Soket will develop India’s first open-source 120-billion-parameter foundation model optimised for the country’s linguistic diversity, targeting sectors such as defence, healthcare and education. Gnani will build a 14-billion-parameter voice AI foundation model delivering multilingual, real-time speech processing with advanced reasoning capabilities. Gan AI will create a 70-billion-parameter multilingual foundation model targeting text-to-speech capabilities.

When asked if the latest Digital Personal Data Protection (DPDP) Rules would make LLM makers tweak these models to comply with the regulations, Raghavan said it is unlikely. “As of now, I do not see the problem because LLMs are memory-less systems which do not store data unlike apps which store consumer data. However, these rules are subject to interpretation,” he added. Retraining such models may require significant cost and effort as companies processing user data — known as data fiduciaries — must clearly explain to users, or data principals, how their personal data will be used. The new rules need informed consent along with easy provisions to revoke it for any personal data processing.