Big data is perhaps the biggest hype in the recent years. People from every sector — be they in industry, sports, health care, or national policymaking — are now obsessed with big data and aspire to use them in every bit of lifestyle.
Big data comprises lots of variables, spooled with loads of data — collecting data from any possible source is a fashion nowadays, quite often without having any idea about what to do or what can be done with them. And quite often we do not know how to analyse that data having so many variables with possible complicated and unknown relationships among many of them. The number of possible pairs showing significant correlation increases in the order of the 'square of the number of variables'. Even ‘independent’ pairs of variables might exhibit high correlation; eg divorce rate in Maine, US, during 2000-2009 nicely correlates with the per capita consumption of margarine in these years. The number of such occurrences of ‘spurious’ or ‘nonsense’ correlations also increases in the order of the 'square of the number of variables'. More than five years back, Nassim Nicholas Taleb, the author of the bestseller book, The Black Swan: The Impact of the Highly Improbable, illustrated through a simulation exercise that with 500 'independent' variables, the number of 'significant' spurious correlations is nearly 6,000, whereas this number grows to 140,000 for 2,500 'independent' variables! Certainly, correlation does not imply causation, but in real life, it is almost impossible to identify these 'spurious' ones among millions of correlations involving thousands of variables.
With the ever-expanding horizon of the Internet of Things (IoT), big data is continuously becoming bigger. The growth of data is exponential — the size of the digital universe will be doubled every two years beyond 2020. And we do not know how to leverage that volume of data, for we have neither the statistical expertise of handling thousands of variables and eliminating 'spurious' correlations nor the suitable computational algorithms and equipment to handle billions of data points. Even if algorithms are available, standard computers are inadequate to handle this gigantic volume of data.
However, the ocean of big data contains limitless possibilities, and the aspiration to extract knowledge from the heartbeats of big data is also huge. The problem is that the present technology and expertise is still primitive. Let’s be honest to admit that. Still, our adventure might become successful in some particular cases with special prior knowledge and special expertise in that topic, and of course, by using ‘instinct’ effectively, but certainly not in general. That's why I'm very sceptical about running routine software packages for analysing big data; we need to develop the required tools very carefully instead, in a case-by-case way. And that's a time-consuming research exercise which can only be performed by top statisticians and computer scientists, together.
Some success stories of big data are of course there. In their 2003 book Scoring Points: How Tesco Continues to Win Customer Loyalty, Clive Humby, Terry Hunt and Tim Phillips discussed how the UK-based grocer Tesco fueled rapid growth by analysing data of customer purchase behaviour. Today we have an unprecedented ability to collect and store data. But, we should always be very careful in monitoring infrastructure to understand individuals' life pathways from loads of data.
In May 2017, Cisco reported only 26 per cent of survey respondents were successful with IoT initiatives, indicating a 74 per cent failure rate. In November 2017, Gartner analyst Nick Heudecker inferred that about 85 per cent of the big data projects fails. My personal belief is that the actual failure percentage is even more, as 'success' is not well-defined in most of the situations dealing with big data, making it difficult to gauge the quantum of failures, or even to understand a 'failure'. When an organisation is happy with the apparent 'success' of the strategy framed by big data analytics, they fail to understand what more could have been done, unless the endeavour collapses like the Google Flu Trends experiment. Also, there is serious doubt about data quality in most of the cases -- according to a Harvard Business Review article of September 2017, only 3 per cent of companies’ data meets basic quality standards
In the 1958 Hollywood movie, The Blob, a meteorite landed in a small Pennsylvania town carrying an alien amoeba, which expanded and swallowed up people and structures, threatening to envelop the whole town. Today's 'big data' sometimes resembles that amoeba, which devours everything. In the process, big data is getting bigger, so are our aspirations. However, our capacity to handle data did not grow proportionately. In the six-decade-old movie, finally, the air force had to swoop in and airlift the amoeba to the Arctic. Well, is that the appropriate way to stop 'Blob' until one gets equipped to handle it? The writer is professor of statistics at the Indian Statistical Institute, Kolkata
One subscription. Two world-class reads.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
)