Once 2D measurements are obtained using the researchers' method, the measurements are fed into a transformer-based encoder — a type of deep learning model — and the high-dimensional, most relevant features in the scene are extracted. These features are fed into a multiscale attention network-based decoder, which outputs the class, location, and size information of all the targets in the scene simultaneously.
Automating advanced visual tasks typically requires detailed images of a scene to extract the features necessary to identify an object. However, this requires either complex imaging hardware or complicated reconstruction algorithms, which leads to high computational cost, long running time, and heavy data transmission load. For this reason, the traditional image-first, perceive-later approaches may not be best for object detection.
Image-free sensing methods based on single-pixel detectors can cut down on the computational power needed for object detection. Instead of employing a pixelated detector such as a CMOS or CCD, single-pixel imaging illuminates the scene with a sequence of structured light patterns and then records the transmitted light intensity to acquire the spatial information of objects. This information is then used to computationally reconstruct the object or to calculate its properties.
According to the researchers, the small-size, optimized pattern sampling used by SPOD achieves high image-free sensing accuracy with about one order of magnitude fewer pattern parameters than this conventional pattern sampling method.
“Compared to the full-size pattern used by other single-pixel detection methods, the small, optimized pattern produces better image-free sensing performance,” researcher Lintao Peng said.
Further, Peng said, “The multiscale attention network in the SPOD decoder reinforces the network’s attention to the target area in the scene. This allows more efficient extraction of scene features, enabling state-of-the art object detection performance.”
“For autonomous driving, SPOD could be used with lidar to help improve scene reconstruction speed and object detection accuracy, ” Bian said. "We believe that it has a high enough detection rate and accuracy for autonomous driving while also reducing the transmission bandwidth and computing resource requirements needed for object detection.”
To experimentally demonstrate SPOD, the researchers built a proof-of-concept setup. Images randomly selected from the Pascal Voc 2012 test data set were printed on film and used as target scenes. At a sampling rate of 5%, the average time to complete spatial light modulation and image-free object detection per scene with SPOD was just 0.016 s. This is a significant boost over methods performing scene reconstruction first (0.05 s) and then object detection (0.018 s). SPOD showed an average detection accuracy of 82.2% with a refresh rate of 63 frames per second for all the object classes included in the test data set.
“Currently, SPOD cannot detect every possible object category because the existing object detection data set used to train the model only contains 80 categories,” Peng said. “However, when faced with a specific task, the pre-trained model can be fine-tuned to achieve image-free multi-object detection of new target classes for applications such as pedestrian, vehicle, or boat detection.”
The researchers plan to extend the image-free perception technology to other kinds of detectors and computational acquisition systems to achieve reconstruction-free sensing technology.
The research was published in Optics Letters (www.doi.org/10.1364/OL.486078).