Search
Menu
CMC Electronics - Advanced Low-Noise 2024 LB

Multicamera Technology Enables Multiperson-Pose Estimation Method

Facebook X LinkedIn Email
A method has been developed that enables computers to understand the body poses and movements of multiple people — including facial expressions and hand positions — using video in real time. The novel method was created using the Panoptic Studio, a multiview system for social motion capture. The Panoptic Studio consists of a two-story dome embedded with 500 video cameras and was developed by researchers at Carnegie Mellon University’s Robotics Institute.

Multiperson pose detection method, Carnegie Mellon Robotics Institute.

Carnegie Mellon University researchers have developed methods to detect the body pose, including facial expressions and hand positions, of multiple individuals. This enables computers to not only identify parts of the body, but to understand how they are moving and positioned. Courtesy of Carnegie Mellon University.

Tracking multiple people in real time, particularly in situations where they may be in contact with each other, presents a number of challenges. The research team took a bottom-up approach, first localizing all the body parts in a scene — arms, legs, faces, etc. — and then associating those parts with specific individuals. 

Hand detection can be an even greater challenge, because a camera is unlikely to see all parts of the hand at the same time. But for every image that shows only part of the hand, there often exists another image from a different angle with a full or complementary view of the hand, said researcher Hanbyul Joo. That’s where the researchers made use of the multicamera Panoptic Studio. 

“A single shot gives you 500 views of a person’s hand, plus it automatically annotates the hand position,” said Joo. “Hands are too small to be annotated by most of our cameras, however, so for this study we used just 31 high-definition cameras, but still were able to build a massive data set.”

The novel method for tracking 2D human form and motion could open up new ways for people and machines to interact with each other, such as communicating with computers simply by pointing at things, said professor Yaser Sheikh.

Detecting the nuances of nonverbal communication between individuals would allow robots to serve in social spaces. A self-driving car could get an early warning that a pedestrian was about to step into the street by monitoring body language. Enabling machines to understand human behavior also could lead to novel approaches to behavioral diagnosis and rehabilitation.

Sheetak -  Cooling at your Fingertip 11/24 MR

In sports analytics, real-time pose detection would make it possible for computers not only to track the position of each player on the field of play, as is now the case, but to also know what players are doing with their arms, legs and heads at each point in time.

Developed a decade ago, the Panoptic Studio served as site of the experiments that led to the discovery.

“The Panoptic Studio supercharges our research,” Sheikh said.

The studio now is being used to improve body, face and hand detectors by jointly training them. As work progresses to move from the 2D models of humans to 3D models, the facility's ability to automatically generate annotated images will be crucial.

To encourage more research and applications, the team has released its computer code for both multiperson and hand-pose estimation. It is being used by research groups, and more than 20 commercial groups, including automotive companies, have expressed interest in licensing the technology, Sheikh said.

When the Panoptic Studio was built with support from the National Science Foundation, it was not clear what impact it would have, said Sheikh.

“Now, we’re able to break through a number of technical barriers primarily as a result of that NSF grant 10 years ago,” he added. “We're sharing the code, but we're also sharing all the data captured in the Panoptic Studio.”

The research on multiperson and hand-pose detection methods will be presented at the Computer Vision and Pattern Recognition Conference, CVPR 2017, July 21-26, 2017 in Honolulu.


Carnegie Mellon University researchers have developed methods that enable computers to track the body pose of multiple individuals. Courtesy of Carnegie Mellon University.

 


Published: July 2017
Glossary
machine vision
Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...
machine learning
Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to improve their performance on a specific task through experience or training. Instead of being explicitly programmed to perform a task, a machine learning system learns from data and examples. The primary goal of machine learning is to develop models that can generalize patterns from data and make predictions or decisions without being...
Research & TechnologyAmericaseducationDisplaysImagingmachine visioncamerasindustrialroboticsmachine learningTech Pulse

We use cookies to improve user experience and analyze our website traffic as stated in our Privacy Policy. By using this website, you agree to the use of cookies unless you have disabled them.