New AI system can predict human actions

Image
Press Trust of India Boston
Last Updated : Jun 22 2016 | 12:13 PM IST
In a breakthrough, scientists have developed an artificial intelligence (AI) system - using videos from YouTube and popular TV shows - that can predict whether two people will hug, kiss or shake hands.
Computer systems that predict actions would open up new possibilities ranging from robots that can better navigate human environments, to emergency response systems that predict falls, to virtual reality headsets that feed you suggestions for what to do in different situations.
Scientists at Massachusetts Institute of Technology (MIT) in the US have made an important new breakthrough in predictive vision, developing an algorithm that can anticipate interactions more accurately than ever before.
Trained on YouTube videos and popular TV shows, the system can predict whether two individuals will hug, kiss, shake hands or slap five.
In a second scenario, it could also anticipate what object is likely to appear in a video five seconds later.
While human greetings may seem like arbitrary actions to predict, the task served as a more easily controllable test case for the researchers to study.
"Humans automatically learn to anticipate actions through experience, which is what made us interested in trying to imbue computers with the same sort of common sense," said Carl Vondrick, PhD student at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL).
"We wanted to show that just by watching large amounts of video, computers can gain enough knowledge to consistently make predictions about their surroundings," said Vondrick.
Researchers created an algorithm that can predict "visual representations," which are basically freeze-frames showing different versions of what the scene might look like.
The algorithm employs techniques from deep-learning, a field of artificial intelligence that uses systems called "neural networks" to teach computers to pore over massive amounts of data to find patterns on their own.
Each of the algorithm's networks predicts a representation is automatically classified as one of the four actions - in this case, a hug, handshake, high-five, or kiss.
The system then merges those actions into one that it uses as its prediction.
For example, three networks might predict a kiss, while another might use the fact that another person has entered the frame as a rationale for predicting a hug instead.
After training the algorithm on 600 hours of unlabelled video, the team tested it on new videos showing both actions and objects.
When shown a video of people who are one second away from performing one of the four actions, the algorithm correctly predicted the action more than 43 per cent of the time, which compares to existing algorithms that could only do 36 per cent of the time.
It is worth noting that even humans make mistakes on these tasks. For example, human subjects were only able to correctly predict the action 71 per cent of the time, researchers said.
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

First Published: Jun 22 2016 | 12:13 PM IST

Next Story