In 2012, Geoffrey Hinton changed the way machines see the world.
Along with two graduate students at the University of Toronto, Hinton, a professor there, built a system that could analyse thousands of photos and teach itself to identify common objects like flowers and cars with an accuracy that didn’t seem possible.
He and his students soon moved to Google, and the mathematical technique that drove their system — called a neural network — spread across the tech world. This is how autonomous cars recognise things like street signs and pedestrians.
But as Hinton himself points out, his idea has had its limits. If a neural network is trained on images that show a coffee cup only from a side, for example, it is unlikely to recognise a coffee cup turned upside down.
Now Hinton and Sara Sabour, a young Google researcher, are exploring an alternative mathematical technique that he calls a capsule network. The idea is to build a system that sees more like a human. If a neural network sees the world in two dimensions, a capsule network can see it in three.
Hinton, a 69-year-old British expatriate, opened Google’s artificial intelligence lab in Toronto this year. The new lab is emblematic of what some believe to be the future of cutting-edge tech research: Much of it is expected to happen outside the United States in Europe, China and longtime AI research centres, like Toronto, that are more welcoming to immigrant researchers.
Sabour is an Iranian researcher who wound up in Toronto after the US government denied her a visa to study computer vision at the University of Washington. Her task is to turn Hinton’s conceptual idea into a mathematical reality, and the project is bearing fruit. They recently published a paper showing that in certain situations their method can more accurately recognise objects when viewing them from unfamiliar angles. “It can generalise much better than the traditional neural nets everyone is now using,” Sabour said.
When I walked into his office this month, Hinton, dressed in his usual button-down shirt and sweater, handed me two large white blocks. They looked like something he had found at the bottom of an old toy chest.
He explained the blocks were two halves of a pyramid, and he asked if I could put the pyramid back together. That didn’t seem too hard. The blocks were oddly shaped, but each had only five sides. All I had to do was find the two sides that matched and line them up. But I couldn’t.
Most people fail this test, he told me, including two tenured professors at the Massachusetts Institute of Technology. One declined to try, and the other insisted it wasn’t possible. It is possible. But we all failed, Hinton explained, because the puzzle undercuts the natural way we see something like a pyramid.
We do not recognise an object by looking at one side and then another and then another. We picture the whole thing sitting in three-dimensional space. And because of the way the puzzle cuts the pyramid in two, it prevents us from picturing it in 3-D space as we normally would.
With his capsule networks, Hinton aims to finally give machines the same three-dimensional perspective that humans have — allowing them to recognise a coffee cup from any angle after learning what it looks like from only one. This is not something that neural networks can do. “It is a fact that is ignored by researchers in computer vision,” he said. “And that is a huge mistake.”
Loosely modelled on the web of neurons in the human brain, neural networks are algorithms that can learn discrete tasks by identifying patterns in large amounts of data. By analysing thousands of car photos, for instance, a neural network can learn to recognise a car.
© 2017 The New York Times