Behavioural scientists use social media to quickly and cheaply gather huge amounts of data about what people are thinking and doing but researchers at Carnegie Mellon University in the US and McGill University in Canada have found that those massive datasets may be misleading.
Carnegie Mellon's Juergen Pfeffer and McGill's Derek Ruths said that scientists need to find ways of correcting for the biases inherent in the information gathered from Twitter and other social media, or to at least acknowledge the shortcomings of that data.
"Not everything that can be labelled as 'Big Data' is automatically great," Pfeffer said.
He said that many researchers think - or hope - that if they gather a large enough dataset they can overcome any biases or distortion that might lurk there.
Despite researchers' attempts to generalise their study results to a broad population, social media sites often have substantial population biases; generating the random samples that give surveys their power to accurately reflect attitudes and behaviour is problematic, scientists said.
Yet Ruths and Pfeffer said researchers seldom acknowledge, much less correct, these built-in sampling biases.
Other questions about data sampling may never be resolved because social media sites use proprietary algorithms to create or filter their data streams and those algorithms are subject to change without warning.
Most researchers are left in the dark, though others with special relationships to the sites may get a look at the site's inner workings.
In an article published in the journal Science, researchers also noted that not all "people" on these sites are even people.
Some are professional writers or public relations representatives, who post on behalf of celebrities or corporations, others are simply phantom accounts. Some "followers" can be bought.
The social media sites try to hunt down and eliminate such bogus accounts - half of all Twitter accounts created in 2013 have already been deleted - but a lone researcher may have difficulty detecting those accounts within a dataset, according to Ruths and Pfeffer.
You’ve reached your limit of {{free_limit}} free articles this month.
Subscribe now for unlimited access.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
