Sunday, June 28, 2026 | 11:39 PM ISTहिंदी में पढें
Business Standard
Notification Icon
userprofile IconSearch

India eyes AI-powered official stats architecture for seamless data access

Once the project is complete, India may be one of the first countries to have a dedicated LLM for its large and multilingual official economy-related statistics

Statistics
premium

Representative Picture

Himanshi BhardwajShine Jacob New Delhi/Chennai

Listen to This Article

India’s official economic statistics spread over multiple databases and government agencies will soon be stored and made available to all users from a single platform — making data access seamless for policy makers, researchers and businesses — people aware of the development said.
 
The Ministry of Statistics and Programme Implementation (Mospi) is building a first-of-its-kind common data platform (CDP) with a plan to develop it into a dedicated large language model (LLM) for official statistics, they said.
 
A CDP is a shared digital system that brings data from different sources into one place for easy access. Such integration eliminates fragmentation across ministries, departments, and datasets and enables users to find, compare, and reuse official data without poring over separate portals and document formats like PDFs, and spreadsheets.
 
Once the project is complete, India may be one of the first countries to have a dedicated LLM for its large and multilingual official economy-related statistics.
 
An LLM is an advanced artificial intelligence (AI) system built and trained on vast amounts of text to understand and generate human-like language. It will enable users to access historical data and analyses reliably with prompts.
 
Several countries, including the United Kingdom, the Netherlands, Canada, Finland, and Singapore, are using a combination of LLMs and retrieval from official databases, rather than training a dedicated LLM for statistics.
 
The government is yet to reveal whether its larger plan is to have a small language model (SLM) with retrieval-augmented generation that has narrower scope and requires less computing power, or an LLM with advanced reasoning, multilingual support, and broader analytical capabilities.
 
Chennai-based information-technology (IT) services and digital transformation firm Bahwan CyberTek (BCT) has been awarded the responsibility of delivering the CDP for Mospi’s National Accounts Division (NAD). This will be the first step towards coming out with an LLM, the persons quoted above said on condition of not being named.
 
The NAD is responsible for compiling key national economic indicators — including gross domestic product (GDP), national income, savings, and capital formation — for which it depends on extensive datasets sourced from ministries, state governments, regulators, and public-sector organisations.
 
“We currently release the data sources required for GDP calculations in the form of PDFs and Excel sheets. The aim of this initiative is to integrate these sources into a single platform, making GDP calculations easier,” said N K Santoshi, director general (central statistics), Mospi, adding that the idea is to develop the platform into an LLM in the future.
 
At present, many indicators used in GDP compilation are taken from different releases, formats, and departments, running into about 300 sources.  
 
“The objective is to build a modern data management and statistics platform for the country. The first phase involves data cleansing, structuring, and creating a centralised data platform. Over an 18-month period, we will help create the foundational architecture required for future AI applications,” confirmed Vish Srinivasan, chief executive officer of BCT's Global Services Business, when asked about the development.
 
The new platform by BCT is being designed to transform the management of India’s economic statistics by integrating data from NAD’s 18 business units and hundreds of disparate data sources into a unified and trusted data ecosystem. BCT’s solution will automate data integration, enhance data quality, and establish a centralised repository that serves as a single source of truth.
 
The company has already worked with the statistics departments in the United Arab Emirates (UAE) and Oman, but the Indian statistics department works on a much larger scale. The CDP will be built on a modern data lakehouse architecture.
 
The platform will provide AI-powered real-time analytics and reporting capabilities, enabling policymakers to monitor economic trends more effectively, accelerate decision-making, and improve operational efficiency. The platform will also support Mospi’s broader modernisation agenda and future data-sharing requirements across government agencies.
 
Srinivasan said his company has around 3,500 employees operating across major geographies globally and has an annual revenue of $240 million through its services business, growing at a rate of 30-35 per cent compound annual growth rate for the last few years. “Our plan is to clock $500 million within three years,” he said.
 
“We are already building AI layers for several customers, but our focus is on AI-driven use cases rather than horizontal platform capabilities. We have spent many years building our US presence and now serve marquee customers including Tesla, Morris, and Nook Digital (part of Barnes & Noble). We believe we can significantly scale our US business,” he added. 

Towards seamless data access

  • India will be one of the first countries to have a dedicated LLM for its large multi-lingual official economy-related statistics
  • The platform will provide AI-powered real-time analytics and reporting capabilities, enabling effective monitoring 
  • of trends
  • First phase involves data cleansing, structuring, and creating a centralised platform
  • A Chennai-based tech company is building the new integrated data platform