New tool to rate quality of Wikipedia entries

Image
Press Trust of India Beijing
Last Updated : Aug 10 2014 | 12:55 PM IST
Computer scientists in China have devised a software algorithm that can automatically check a particular entry on Wikipedia and rank it according to quality.
Jingyu Han and Kejia Chen of Nanjing University of Posts and Telecommunications, said that the quality of data on Wikipedia has for many years been the focus of user attention.
Its detractors suggest that it can never be a valid information source in the way that a proprietary encyclopedia might be because the contributors and editors are not under the direct control of a single publisher with a vested interest in quality control.
Its supporters suggest that the social nature of contributions and edits and the online tracking of changes is one of Wikipedia's greatest strengths rather than a weakness.
Nevertheless, it would quiet the detractors if there were a way to quantify the quality of Wikipedia entries in an objective and automated manner, researchers said.
To address this, Han and Chen turned to Bayesian statistics to help them create just such a system.
The notion of finding evidence based on an analysis of probabilities was first described by 18th Century mathematician and theologian Thomas Bayes.
Bayesian probabilities were then utilised by Pierre-Simon Laplace to pioneer a new statistical method.
Today, Bayesian analysis is commonly used to assess the content of emails and to determine the probability that the content is spam, junk mail, and so filter it from the user's inbox if the probability is high.
Han and Chen have now used dynamic Bayesian network (DBN) to analyse in a similar manner the content of Wikipedia entries.
Very low-ranking entries might be flagged for editorial attention to raise the quality. By contrast, high-ranking entries could be marked in some way as the definitive entry so that such an entry is not subsequently overwritten with lower quality information.
The team has tested its algorithm on sets of several hundred articles comparing the automated quality assessment by the computer with assessment by a human user.
Their algorithm out-performs a human user by up to 23 per cent in correctly classifying the quality rank of a given article in the set.
The research is published in the International Journal of Information Quality.
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

First Published: Aug 10 2014 | 12:55 PM IST

Next Story