AI Learns to Predict Human Behavior from Videos

A team from Columbia University has developed an algorithm to give machines a more intuitive sense of what may happen next after an action. The computer vision algorithm is designed to predict human interactions and body language in video. The researchers said this capability could have applications for assistive technology, autonomous vehicles, and collaborative robots, and that it is the most accurate method to date for predicting video action events up to several minutes in the future.

“Our algorithm is a step toward machines being able to make better predictions about human behavior and thus better coordinate their actions with ours,” said Carl Vondrick, assistant professor of computer science at Columbia, who directed the study. “Our results open a number of possibilities for human-robot collaboration, autonomous vehicles, and assistive technology.”

Previous attempts at predictive machine learning, including those by the team, focused on predicting just one action at a time. The algorithms decide whether to classify the action as a hug, high-five, handshake, or even a nonaction like “ignore.” But when the uncertainty is high, most machine learning models are unable to find commonalities between the possible options.

This approach looks at the longer-range prediction problem from a different angle. After analyzing thousands of hours of movies, sports games, and shows like “The Office,” the system learns to predict hundreds of activities by leveraging higher-level associations between people, animals, and objects.

“Not everything in the future is predictable, said Didac Suris, a Ph.D. student in engineering and co-lead author of the paper. “When a person cannot foresee exactly what will happen, they play it safe and predict at a higher level of abstraction. Our algorithm is the first to learn this capability to reason abstractly about future events.”

The team said it had to revisit questions in mathematics — dating back to the ancient Greeks. In geometry, students learn rules about straight lines, parallel lines, and so on. Machine learning systems typically obey these rules as well, though other geometries have counterintuitive properties such as straight lines that bend, or triangles with a curved line. The team used these unusual geometries to build AI models that organize high-level concepts and predict human behavior in the future.

“Prediction is the basis of human intelligence,” said Aude Oliva, a senior research scientist at MIT and co-director for the MIT-IGM Watson AI Lab; she is an expert in AI and human cognition who is not involved in the study. “Machines make mistakes that humans never would because they lack our ability to reason abstractly. This work is a pivotal step toward bridging this technological gap.”

The mathematical framework developed by the researchers enables machines to organize events by how predictable they are in the future. The system categorizes activities on its own, is aware of uncertainty, and provides more specific actions when there is certainty — as opposed to more generic predictions when there is not.

The technique, the researchers said, could move computers closer to being able to interpret a situation and make a nuanced decision, rather than a preprogrammed action, the researchers said. It’s a critical step in building trust between humans and computers. “If machines can understand and anticipate our behaviors, computers will be able to seamlessly assist people in daily activity,” said Ruoshi Liu, a Ph.D. student in engineering at Columbia and co-lead author of the paper.

Though the new algorithm is able make more accurate predictions on benchmark tasks than previous methods, the next steps are to verify that it works outside the lab, Vondrick said. If the system can work in diverse settings, it could open possibilities to deploy machines and robots to improve safety, health, and security, the researchers said. The group intends to continue improving the algorithm’s performance with larger data sets and computers, and other forms of geometry.

The study was presented at the Conference on Computer Vision and Pattern Recognition, June 24, 2021 (www.arxiv.org/pdf/2101.01600.pdf).