The founders of San Francisco-based TurboML, a real-time machine learning platform, have launched a strategic initiative to bring together artificial intelligence (AI) researchers of Indian origin from around the world for building an AI foundational model for Indian languages.
Siddharth Bhatia, the company’s cofounder, revealed that they have discussed their plan with the Ministry of Electronics and Information Technology (MeitY) and recently met MeitY Minister Ashwini Vaishnaw as well. The team aims to develop the AI foundational model within 8 months at a budget of under $12 million.
“Indians have been on the forefront of AI innovation, right from the original Transformers paper, to teams behind leading models like OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and Meta’s Llama, and we plan to collaborate with such exceptional minds,” Bhatia shared. To ensure long-term sustainability, they are also exploring commercial offerings alongside research and development efforts.
After a detailed cost analysis, Bhatia estimates the required investment at $11.5 million. This budget covers two phases of development, including GPU processing costs, training and retraining datasets, and hiring a core team of around 20 people — most of whom will be based in India.
He noted that GPU costs play a significant role in the budget, with the current commercial rate per GPU hour factored in. However, costs could drop substantially under MeitY’s subsidy scheme, potentially bringing the rate down to $1 per hour.
“With the recent launch of DeepSeek from China, many researchers are exploring innovative ways to build AI models at significantly lower costs — unlike the billions of dollars spent on models like ChatGPT,” Bhatia explained.
One of the biggest challenges in developing foundational AI models for India is the lack of internet-scale data, unlike the US or China. Additionally, India’s immense linguistic diversity presents unique hurdles. “Even when we speak English, it’s often interspersed with Hindi — this code-switching is common across many Indian languages and dialects,” Bhatia pointed out. This linguistic complexity requires advanced techniques to build truly effective models.
Bhatia adds that they are committed to open-sourcing essential components, including frameworks and code, reinforcement learning data, and model weights for select models.
Responding to their plan, serial investor Vinod Khosla posted on X, “Every country of size will likely have its own AI model and approach, hyperlocalised. Trump has made clear no country can rely on US models as its only option.”
Focus on Languages
* Efforts on to rope in AI researchers of Indian origin to build a foundational AI model in Indian languages
* To bring in Indian AI experts who have worked with ChatGPT, Anthropic’s Claude, Google Gemini, and Meta’s Llama