Researchers from the US have developed an artificial intelligence (AI) system that surfs the internet, extracts information from the available plain text and organises it for quantitative analysis in very less time.
Recently at the Association for Computational Linguistics' Conference on Empirical Methods on Natural Language Processing, researchers from the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory won a best-paper award for a new approach to information extraction that turns conventional machine learning on its head.
Most machine-learning systems work by combing through training examples and looking for patterns that correspond to classifications provided by human annotators.
In their new paper, the MIT researchers trained their system on scanty data -- because in the scenario they're investigating, that's usually all that's available. But then they find the limited information an easy problem to solve.
"In information extraction, traditionally, in natural-language processing, you are given an article and you need to do whatever it takes to extract correctly from this article," said Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science.
"That's very different from what you or I would do. When you are reading an article that you cannot understand, you are going to go on the web and find one that you can understand," Barzilay, who also a senior author of the paper, added.
A machine-learning system assigns each of its classifications a confidence score -- which is a measure of the statistical likelihood that the classification is correct -- given the patterns discerned in the training data.
With the researchers' new system, if the confidence score is too low, the system automatically does a web search to pull up texts likely to contain the data it is trying to extract.
It then attempts to extract the relevant data from one of the new texts and reconciles the results with those of its initial extraction.
If the confidence score remains too low, it moves on to the next text pulled up by the search string, and so on.
Eventually, the system learns how to generate search queries, gauge the likelihood that a new text is relevant to its extraction task, and determine the best strategy for fusing the results of multiple attempts at extraction.
--IANS
sku/sm/vt
Disclaimer: No Business Standard Journalist was involved in creation of this content
You’ve reached your limit of {{free_limit}} free articles this month.
Subscribe now for unlimited access.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
