A Virginia Tech professor has been honored with a Google Faculty Research Award to support his work in detecting human-object interaction in images and videos, to aid in the development of socially intelligent machines and enable other high-impact applications. Jia-Bin Huang, an assistant professor in the electrical and computer engineering department, received the award in the machine perception category to support his work with the challenges of detecting two aspects of human-object interaction: modeling the relationship between a person and relevant objects/scenes for gathering contextual information, and mining hard examples automatically from unlabeled but interaction-rich videos. Jia-Bin Huang, assistant professor at Virgina Tech. Courtesy of Virginia Tech According to Huang, although significant progress has been made in classifying, detecting, and segmenting objects, representing images/videos as a collection of isolated object instances has failed to capture the information essential for understanding activity. The goal is to localize persons and object instances in an image or video, as well as recognize interaction, if any, between each pair of a person and an object. This provides a structured representation of a visually grounded graph over the humans and the object instances with which they interact. “Understanding human activity in images and/or videos is a fundamental step toward building socially aware agents, semantic image/video retrieval, captioning, and question-answering,” Huang said.