Computer Vision System Mimics Human Visualization

A computer vision system that can identify objects based on the same method of visual learning that humans use has been developed at the UCLA Samueli School of Engineering. The system could be an advance in computer vision and a step toward general artificial intelligence (AI) systems, that is, computer systems that learn on their own, are intuitive, make decisions based on reasoning, and interact with humans in a more humanlike way.

Current computer vision systems are not designed to learn on their own. They must be trained on exactly what to learn, usually by reviewing thousands of images in which the objects they are trying to identify are labeled for them.

Cpmputer vision system mimics how humans identify objects, UCLA.

Cpmputer vision system mimics how humans identify objects, UCLA.

A computer vision system developed at UCLA can identify objects based on only partial glimpses, for example, by using these photo snippets of a motorcycle. Courtesy of UCLA Samueli.

The UCLA system uses a three-step approach. First, it breaks up an image into small chunks, which the researchers call “viewlets.” Second, it learns how these viewlets fit together to form the object in question. Then, it looks at what other objects are in the surrounding area, and whether these objects are relevant to describing and identifying the primary object.

To help the new system learn more like humans, the engineers immersed it in an internet replica of the environment in which humans live. “Fortunately, the internet provides two things that help a brain-inspired computer vision system learn the same way humans do,” said professor Vwani Roychowdhury. “One is a wealth of images and videos that depict the same types of objects. The second is that these objects are shown from many perspectives — obscured, bird’s-eye, up close — and they are placed in different kinds of environments.”

Bristol Instruments, Inc. - 872 Series LWM 10/24 MR

The researchers drew insight into contextual learning from findings in cognitive psychology and neuroscience. “Contextual learning is a key feature of our brains, and it helps us build robust models of objects that are part of an integrated worldview where everything is functionally connected,” Roychowdhury said.

The UCLA system provides a scalable framework for unsupervised learning of object prototypes that enables identification of deformable objects, from their parts, their different configurations and views, and their spatial relationships. Computationally, the object prototypes are represented as geometric associative networks.

Computer vision system identifies objects like humans do, UCLA.

Computer vision system identifies objects like humans do, UCLA.

The system understands what a human body is by looking at thousands of images with people in them and then ignoring nonessential background objects. Courtesy of UCLA Samueli.

The researchers tested the system with about 9000 images, each showing people and other objects. The system was able to build a detailed model of the human body without external guidance and without the images being labeled. The researchers ran similar tests using images of motorcycles, cars, and airplanes. In all cases, their system performed better or at least as well as traditional computer vision systems that have been developed with many years of training.

The research was published in Proceedings of the National Academy of Sciences (https://doi.org/10.1073/pnas.1802103115).

Published: December 2018

Glossary

machine learning: Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to improve their performance on a specific task through experience or training. Instead of being explicitly programmed to perform a task, a machine learning system learns from data and examples. The primary goal of machine learning is to develop models that can generalize patterns from data and make predictions or decisions without being...
artificial intelligence: The ability of a machine to perform certain complex functions normally associated with human intelligence, such as judgment, pattern recognition, understanding, learning, planning, and problem solving.
machine vision: Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...
computer vision: Computer vision enables computers to interpret and make decisions based on visual data, such as images and videos. It involves the development of algorithms, techniques, and systems that enable machines to gain an understanding of the visual world, similar to how humans perceive and interpret visual information. Key aspects and tasks within computer vision include: Image recognition: Identifying and categorizing objects, scenes, or patterns within images. This involves training...

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.