For more than 50 years, linguists and computer scientists have tried to get computers to understand human language by programming semantics as software, researchers said.
Now, a University of Texas at Austin linguistics researcher, Katrin Erk, is using supercomputers to develop a new method for helping computers learn natural language.
Instead of hard-coding human logic or deciphering dictionaries to try to teach computers language, Erk decided to try a different tactic: feed computers a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a map of relationships.
To create a model that can accurately recreate the intuitive ability to distinguish word meaning requires a lot of text and a lot of analytical horsepower, researchers said.
"The lower end for this kind of a research is a text collection of 100 million words.
Erk initially conducted her research on desktop computers, but then she began using the parallel computing systems.
Access to a special Hadoop-optimised subsystem allowed Erk and her collaborators to expand the scope of their research.
"We use a gigantic 10,000-dimensional space with all these different points for each word to predict paraphrases," Erk said.
