Hybrid Comparative Solution Boosts Multi-Object Tracking

GWANGJU, South Korea, Aug. 4, 2021 — A team at the Gwangju Institute of Science and Technology (GIST) in Korea, led by Moongu Jeon, implemented a technique called deep temporal appearance matching association, or Deep-TAM, to overcome short-term occlusion, which affects the ability of computer vision systems to simultaneously track objects. The framework was shown to achieve high performance without sacrificing computational speed.

Algorithms that can simultaneously track multiple objects are essential to applications that range from autonomous driving to advanced public surveillance. However, it is difficult for computers to discriminate between detected objects based on their appearance.

One example of a function that remains difficult for computers is object tracking, which involves recognizing persistent objects in video footage and tracking their movements. While computers can simultaneously track more objects than humans, they usually fail to discriminate the appearance of different objects.

This, in turn, can lead the algorithm to mix up objects in a scene and ultimately produce incorrect tracking results.

Conventional tracking determines object trajectories by associating a bounding box to each detected object and establishing geometric constraints. The difficulty in this approach is in accurately matching previously tracked objects with objects detected in the current frame. Differentiating detected objects based on features like color usually fails because of changes in lighting condition and occlusions.

Meadowlark Optics - Wave Plates 6/24 MR 2024

The researchers’ solution focused on enabling the tracking model to accurately extract the known features of detected objects and compare them not only with those of other objects in the frame, but also with a recorded history of known features.

To this end, the researchers combined joint-inference neural networks (JI-Nets) with long-short-term-memory networks (LSTMs). LSTMs help to associate stored appearances with those in the current frame; JI-Nets allow for comparing the appearances of two detected objects simultaneously from scratch. Using historical appearances in this way allowed the algorithm to overcome short-term occlusions of the tracked objects.

“Compared to conventional methods that preextract features from each object independently, the proposed joint-inference method exhibited better accuracy in public surveillance tasks, namely pedestrian tracking,” Jeon said.

The researchers also offset a main drawback of deep learning — low speed — by adopting indexing-based GPU parallelization to reduce computing times. Tests on public surveillance data sets confirmed that the proposed tracking framework offers state-of-the-art accuracy and is therefore ready for deployment.

Published: August 2021

Glossary

computer vision: Computer vision enables computers to interpret and make decisions based on visual data, such as images and videos. It involves the development of algorithms, techniques, and systems that enable machines to gain an understanding of the visual world, similar to how humans perceive and interpret visual information. Key aspects and tasks within computer vision include: Image recognition: Identifying and categorizing objects, scenes, or patterns within images. This involves training...
tracking: 1. The process of following an object's movement; accomplished by focusing a radar beam on the reticle of an optical system on the object and plotting its bearing and distance at specific intervals. 2. In display technology, use of a light pen to move an object across a display screen.
machine vision: Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...
image comparison: A method used in imaging to detect subtle differences between two apparently similar pictures. It can be achieved by superimposing the negative of one photograph over a contact print of another, by projecting or displaying the images side by side, or by displaying the images in rapid sequence.

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.