Conflicting definitions across ministries pose a major challenge for artificial intelligence (AI) applications in government, with the Centre now working to create common classifications and standards that can make public datasets interoperable, Saurabh Garg, secretary, Ministry of Statistics and Programme Implementation (MoSPI), said on Thursday.
Speaking at an event on artificial intelligence and digital public infrastructure (DPI) organised by the National Council of Applied Economic Research (NCAER), Garg said India had largely addressed the challenge of systems exchanging data through common APIs and digital platforms, but still needed to tackle what he described as "semantic interoperability" — ensuring that different government datasets use common definitions, classifications and identifiers that AI systems can understand.
"I think where we need to work more is on the semantic interoperability that AI systems or artificial intelligence can understand the context of the definitions and the classifications. And this is extremely important because if a definition or maybe any concept by two systems are different, then those two systems can’t talk to each other,” said Garg. “Semantics is extremely important and that's one area that we are working on," he added.
He cited the example of "pucca houses", noting that different government departments use different criteria to define the term. While some ministries focus on the nature of walls, others consider roofing standards, and still others take flooring into account because of its implications for public health. Such differences can create complications when designing and implementing welfare programmes aimed at assisting households without permanent housing, he said.
The government is therefore working on standardising classifications, identifiers and metadata across datasets as part of a broader effort to make public data AI-ready.
Garg said India had already built the foundations of digital public infrastructure through identity, payments and consent-based data-sharing systems such as Aadhaar, UPI, DigiLocker and Account Aggregator. The next phase, he said, would involve creating the data architecture needed to support AI-driven applications and public services.
"The raw material of AI is data. You have energy, you have chips, you have models, but without data, AI is really not there. And we need to ensure that data is interoperable, harmonised and machine-readable. That is something which we are still grappling with, at least in India and many other countries, too," Garg said, arguing that access to harmonised, machine-readable and trustworthy data would be critical to unlocking the technology's economic and social benefits.
According to him, government data represents a particularly valuable resource because of its scale and its potential to be used for public benefit. However, datasets remain fragmented across departments, stored in different formats and governed by varying standards, making integration difficult.
To address these challenges, the government has identified priority datasets, common identifiers and internationally recognised classifications that can serve as the basis for interoperability. It is also working on data-sharing protocols, quality assessment frameworks and metadata standards, he said.
Garg added that the objective was to ensure that government data adheres to the FAIR principles — making it findable, accessible, interoperable and reusable — while maintaining safeguards for privacy and confidentiality under existing laws.
"There needs to be a basic quality assessment; otherwise, the credibility of data is suspect. And it is this data organisation within the government that is an extremely high priority for us," he said.