The dataset, called Microsoft MAchine Reading COmprehension (MS MARCO), is based on anonymised real-world data.
By making it broadly available to researchers, the team is hoping to spur the kind of breakthroughs in machine reading that are already happening in image and speech recognition.
They also hope to facilitate the kind of advances that may lead to the long-term goal of 'artificial general intelligence,' or machines that can think like humans.
Right now, systems to answer sophisticated questions are still in their infancy. Some can answer basic questions, like 'What day does Hanukkah start?' or 'What's 2,000 times 43?', said Majumder.
However, in many cases, search engines and virtual assistants will instead point the user to a set of search engine results.
Users can still get the information they need, but it requires culling through the results and finding the answer on the web page.
These datasets can be used to teach AI systems to recognise questions and formulate answers and eventually create systems that can come up with their own answers based on unique questions they have not seen before.
Disclaimer: No Business Standard Journalist was involved in creation of this content
