A single-pixel machine vision framework leverages deep-learning designed optical networks to bypass the need for an image sensor-array or digital processor. The system, developed in the lab of UCLA Chancellor’s Professor Aydogan Ozcan, paves the way for tackling certain challenges that are beyond the capabilities of current imaging and machine learning technologies.
Most machine vision systems used today use a lens-based camera that sends information to a digital processor that performs machine learning tasks. Even with modern, state-of-the-art technology, these systems suffer certain drawbacks; the video feed, by nature of the camera’s high-pixel count, contains a large volume of data, usually with redundant information. This can overburden the processor and create inefficiencies in terms of the power and memory needed to handle this often redundant information.
UCLA researchers created a single-pixel machine vision system that can encode the spatial information of objects into the spectrum of light to optically classify input objects and reconstruct their images using a single-pixel detector. Courtesy of Ozcan Lab, UCLA.
Additionally, the fabrication of high-pixel count image sensors beyond the visible region contains certain challenges that constrain the application of standard machine vision methods at longer wavelengths, such as in the infrared and terahertz regions.
“Here in this work, we have bypassed these challenges as we can classify objects and reconstruct their images through a single-pixel detector using a diffractive network that encodes the spatial features of objects or a scene into the spectrum of diffracted light,” Ozcan told Photonics Media. The team deployed deep learning to design optical networks created from successive diffractive surfaces to perform computation and statistical inference as the input light passes through the specifically designed 3D-fabricated layers. In contrast to lens-based cameras, these diffractive optical networks are designed to process the incoming light at determined wavelengths.
The goal is to extract and then to encode the spatial features of an input object onto the spectrum of the diffracted light. A single-pixel detector is used to collect that light.
Different object types, or classes of data, are then assigned to different wavelengths. An automated all-optical classification process classifies the input images, using the output spectrum detected by a single pixel. The process overcomes the need for an image sensor-array or a digital processor.
“During the training phase of this diffractive network, we assign one unique wavelength to each object class,” Ozcan said.
In this case, the object classes were handwritten digits; the researchers selected 10 wavelengths uniformly across an available bandwidth to represent the digits 0 through 9.
In addition to all-optical classification of object types, the team also connected the system to a simple, shallow electronic neural network to rapidly reconstruct the images of the classified input objects, based solely on the power detected at distinct wavelengths, demonstrating task-specific image decompression.
“The diffractive features on the layers were adjusted iteratively (using deep learning principles) until the physical structure of the diffractive layers converged to patterns that yielded the largest spectral power among these 10 selected wavelengths to correspond to the class/digit of the input object,” Ozcan told Photonics Media. “In other words, the diffractive network that is located in front of the single pixel learned to channel the largest spectral power at the single pixel detector based on the input object’s class (digit).”
Researchers created a single-pixel machine-vision system that can encode the spatial information of objects into the spectrum of light to optically classify input objects and reconstruct their images using a single-pixel detector in a single snapshot. Courtesy of Ozcan Lab, UCLA.
In further training the neural network, an error was generated when the diffractive network misidentified, or could not identify, the input handwritten digit based on the maximum spectral power at the single-pixel detector. That error was used to adjust the diffractive features on the transmissive layers until they showed the correct inference of the input class based on the detected spectral power.
Despite use of the single-pixel detector, the researchers ultimately achieved greater than 96% optical classification accuracy of handwritten digits. An experimental proof-of-concept study with 3D-printed diffractive layers showed close agreement with the numerical simulations — demonstrating the efficacy of the single-pixel machine vision framework for building low-latency and resource-efficient machine learning systems.
Applications and Usability
The framework follows research published earlier this year in which Ozcan and his colleagues used diffractive surfaces to shape terahertz pulses. The previous work, he said, demonstrated a “deterministic” task, where the new advance is better characterized as the implementation of “statistical inference,” since the task of recognizing handwritten digits is not a deterministic operation, or function.
“In our former work (pulse shaping), we exactly knew what the input pulse profile is, and what we want it to be at the output location (in other words, inputs were known),” Ozcan said. “In this work, we demonstrated a single-pixel object classifier where the input objects are unknown, new handwritten digits that were never seen by the network before. In this sense, our single-pixel diffractive classifier recognized the spatial features of handwritten digits, using the spectral power detected by a single pixel.”
The significance of both lines of research extends to the lack of readily available, high pixel count image sensors at the far- and mid-infrared and terahertz bands. From the machine vision framework, Ozcan said, it is conceivable to envision a focal plane array that is based on a diffractive optical network operating in the infrared that can be applied to defense and security applications to detect and image certain target objects, and in certain conditions.
Additional uses span applications in biomedical imaging and metrology/interferometry.
“This work shows how spatial features of a scene can be encoded in spectrum, in a single shot, and demonstrates that a properly trained diffractive network can perform this encoding to achieve all-optical classification of input objects and reconstruction of their images,” Ozcan told Photonics Media. “The core principles of this spectrally-encoded single-pixel machine vision framework can therefore be applied for building new 3D-imaging systems, finding uses, for example, in optical coherence tomography (OCT).”
The research was published in Science Advances (www.doi.org/10.1126/sciadv.abd7690).