Addressing discrepancies in the official statistical system amid data gaps

The discrepancies in the statistics produced by different agencies are generally due to issues of definitions, coverage, and differences in statutory and procedural requirements

9 min read Last Updated : Mar 06 2025 | 10:31 PM IST

One of the strong points of the Indian statistical system is the availability of information from multiple sources on important key socioeconomic indicators. Information from independent sample surveys, carried out through an established institutional system, such as those of National Sample Survey (NSS) and National Family Health Survey (NFHS), have often been used to cross-validate statistics, available from official or departmental sources and major government programmes. The latter can be accessed through dashboards on the websites of the different departments. The progress of major national programmes is also widely published by the official agencies, unlike in the past where such progress was reported only in their annual reports.

Understandably, there are differences in the magnitude and even the direction of change in the information from government departments and those from established institutional surveys. These discrepancies in the statistics produced by different agencies are generally due to issues of definitions, coverage, and differences in statutory and procedural requirements. The degree of autonomy of the agency involved in data collection and employment of temporary and semiskilled personnel with differential capacity, as against the well-trained permanent survey staff by an established national organisation, contribute to these differences. The data, generated as a by-product of statutes through administrative interventions, often turns out to be incomplete or inaccurate. The differences are mostly not due to any explicit or deliberate agency bias, influencing data collection to produce certain outcomes, but from inherent procedural differences.

The national surveys conducted by the established institutions, however, are very different. These are not designed in the context of any specific programme and hence develop the concepts, sampling frame, methodology, etc. through wider consensus among prospective user agencies, including researchers and civil society. The NFHS and NSS are now acknowledged as providing comparable data to assess socio-economic development and are being used in mainstream research and policy deliberations. Their results are mostly accepted by data users, not only due to the robustness of their methodology but also for holistic coverage of several related variables within a consistent frame.

The situation has been changing over the past couple of decades with the emergence of several agencies conducting surveys at the national level. This, however, has resulted in conflicting trends and patterns in certain parameters. NFHS, for example, has reported a decline in nutritional level for certain groups against claims to the contrary under government missions. The differences have led to a healthy debate on measurement issues. The wide availability of unit-level data from NFHS and use of advanced data analytics software have permitted further probing into conceptual issues.

The discrepancies in the data, nonetheless, have remained a nudging problem. The governments, both at national and state levels, have sometimes referred to the statistics from the national-level surveys while launching certain programmes or claiming success in their interventions. Researchers using the national survey data, on the other hand, have often shown that the progress in development parameters, after adjusting for the definitions, coverage and timing issues, is significantly below the claims by the ministries. In such situations, officials generally have chosen to ignore the data from the national surveys and trust the departmental information collected through the same system that is responsible for implementing the programmes.

The discrepancies between the survey and official data have become more serious in recent years. Although the sampling and non-sampling issues in the national surveys can explain a part of the discrepancies, much of it is being attributed to agency bias, putting a question mark on the robustness of the official data. Recently, there was controversy on the female-male ratio, thrown up by the NFHS, which was noted as being much higher than the Census estimates. It was, however, promptly explained that the former excludes the non-household population from its coverage that have predominance of men.

The national surveys, thus, are believed to provide a more objective picture of the ground reality, since neither the respondent nor the enumerator has any personal gain from the recorded information. The departmental data, on the other hand, comes from implementing agencies through reporting of achievements against targets. These achievements are often the physical completion of facilities and not necessarily their usage or access of the respondents. It is important to note that the individual responses — based on the respondents’ perception cross-verified by the investigator — are likely to be unbiased when both have no personal stake in the actual response, as is the case in national institutionalised surveys.

The socio-economic caste census, for example, overstated deprivation of several caste groups as the respondents were aware that the information would be used to determine the poverty entitlements of their caste. Similarly, administrative reporting, meant to aid implementation, monitoring and evaluation, is likely to have agency or enumerators’ bias. The programme-based information is usually reported by the officials responsible for implementation against the assigned targets.

Given the scepticism with regard to official data, survey-based validation is now getting built into government programmes. Such evaluative surveys are, however, very different from the national surveys mentioned above, as these are mostly outsourced to private agencies or public undertakings in a manner that the possibility of agency bias is not ruled out. Most of these data gathering agencies work in a competitive environment and are willing to negotiate the costs, duration, etc, that tend to affect the quality of the staff, their emoluments and ultimately that of the information. Consequently, the official claims on outcomes of projects or missions mostly have an underlying credibility question.

Understanding data anomalies

It is important that we probe deeper into the anomalies between the data coming from institutionalised surveys and official sources.

The Jal Jeevan Mission (JJM) is a key national initiative to provide tap water connection to every rural household. The JJM, which started on August 15, 2019, considered that only 17 per cent of the rural households had tap water connections. The Multiple Indicator Survey (MIS) of NSS, as a part of its regular survey on housing conditions, covering the period from July to December 2018, however, had reported that 21.6 per cent of the rural households have piped water, much higher than

what was reported by JJM a few months back.

The NSS estimate for 2020–21 is 24.8 per cent, much less than the figure given by the ministry, as noted below. The NFHS-5, conducted during 2019–21, estimated the rural households getting improved drinking water from taps within the dwelling or yard/plot to be 22.6 per cent for the country, very close to the NSS estimate from the MIS.

Interestingly, against the MIS of NSS, which suggests that 24.8 per cent of the rural households had access to piped water in 2020–21 (though the results in the report mention persons), the figure from the Ministry of Drinking Water and Sanitation for December 31, 2020 is 32.5 per cent. According to JJM, 54 per cent among the rural households had access to safe drinking water by the end of March 2023. This must be considered in the context of the mission’s goal of providing safe and adequate drinking water through individual tap connections by 2024 to all rural houses. However, going by the recent trend, as discussed above, this looks extremely ambitious. If true, the achievement is commendable, but this would be partly due to the government’s baseline figure of 17 per cent, which is much less than those of NSS or NFHS, as noted above.

Similarly, the NFHS and NSS estimates of rural sanitation coverage are much below the claims made under Swachh Bharat Mission (SBM), based on the National Annual Rural Sanitation Survey (NARSS) (2019–20) conducted by the Ministry of Drinking Water and Sanitation at the behest of the World Bank. A non-governmental agency was tasked with the survey to measure the performance of each state with respect to disbursements linked to incentives. There was widespread awareness and enthusiasm for the SBM at the ground level to get their village declared as open defecation free (ODF). This could be a factor in the overestimation of the coverage.

The NSS 2018 estimated the percentage of rural households reporting toilet access as 71.3 per cent, of which 69.3 per cent was in the category of improved latrine. The NFHS (2019–21), too, reported 71 per cent of the rural households having access to improved sanitation facilities that included 7.4 per cent of households mentioning these as being shared. Both the NSS and NFHS estimates fall short of the government claims of rural India becoming ODF or the NARSS coverage of 93.5 per cent. Undoubtedly, there has been a significant reduction in households without access to a latrine due to the mission. This notwithstanding, both NSS and NFHS show lower achievements than the official claims.

The economic advisory committee to the Government of India has tried to reconcile such divergences in the data on drinking water. It argues that timely release of data and standardisation of definitions and questionnaires would remove such anomalies. There can be no two opinions on the need for timely release of data. However, there is some inevitable delay in collating and scrutinising the field data by NSS and NFHS. The situation, however, has improved now.

The question of standardisation of definitions and questionnaires in surveys to match the administrative yardsticks is not easy. Administrative yardsticks often give a one-dimensional view of the programme achievements. NFHS and NSS, on the other hand, employ standard definitions, ensuring comparability with previous results. These definitions and concepts generally adopt internationally followed conventions. The administrative data, however, do not often have complete coverage and is designed for monitoring and evaluation of the specific programme in terms of outputs/achievement, set in the project documents.

The estimates based on the national surveys are, therefore, helpful to cross-validate the claims of administrative agencies, but their utility goes much beyond that. The robustness and quality of NFHS data must, therefore, be judged by the robustness of statistical parameters and their temporal and cross-sectional comparability and not by the recent controversies regarding its funding agency.

The writer is Professor Emeritus at L J University, Ahmedabad