Novel CMOS Technology Reboots Vision-Guided Robotics

An innovative CMOS sensor and parallel structured light design have improved both the resolution and shutter speed of vision systems for robot guidance.

SVORAD STOLC AND ANDREA PUFFLEROVA, PHOTONEO

A vision-guided robot (VGR) system can be defined as a robot equipped with one or multiple vision sensors that effectively give the machine a sense of sight. While “blind” robots may suffice for certain simple applications, they quickly reach their limits when confronting variable tasks or environments. Vision-guided robotics enables the automation of much more complex and sophisticated tasks, including the recognition, processing, and handling of objects based on data obtained from a 3D vision system.

While 3D imaging has made robotic systems more capable of navigating variability in their work, the technology has traditionally presented an inherent trade-off between the resolution of image data and the speed at which it can be captured. Recent advancements in CMOS sensor design have suggested that this trade-off is not as insurmountable as it once appeared. Courtesy of iStock.com/nay.

While 3D imaging has made robotic systems more capable of navigating variability in their work, the technology has traditionally presented an inherent trade-off between the resolution of image data and the speed at which it can be captured. Recent advancements in CMOS sensor design have suggested that this trade-off is not as insurmountable as it once appeared. Courtesy of iStock.com/nay.

The challenge is that the functionality of VGRs is directly limited by the constraints of their vision components. Among these constraints is the seemingly inherent trade-off between the resolution of image data and the speed at which it can be captured. However, recent developments in CMOS sensor design, combined with a novel structured light technique, are demonstrating that this trade-off is not as insurmountable as it once appeared.

A review of sensors

CMOSs and CCDs are the two major types of digital image sensors used in industrial cameras today. The basic differences between these two technologies involve their pixel architectures — how they capture each frame, read the charge from a photodiode, and then digitize the signal.

In a CCD sensor, the charge is transported across the chip and read at one corner of the array of photosensitive sites. Subsequently, an analog-to-digital converter turns each pixel’s signal into a digital value. Conversely, a CMOS sensor features several transistors at each pixel that amplify and move the charge so that pixels can be read individually.

While conventional multi-tap CMOS sensors used in time-of-flight cameras modulate pixels the same way in parallel, the mosaic-shutter CMOS sensor modulates its pixels individually, according to a unique pixel-mosaic code. This enables 3D cameras to capture high-resolution 3D images of objects that are moving up to 144 km/h (89 mph), with high accuracy and without motion artifacts. Courtesy of Photoneo.

Though both have advantages and disadvantages over one another, CMOS sensors surpass CCD sensors in a number of respects and have gradually become the favored choice for machine vision applications. Their high frame rate and resolution, good noise characteristics, low power consumption, and other impressive features have also made them the preferred sensor for the majority of the 3D vision systems currently available on the market. While CCD sensors keep a strong position in the life sciences and related applications, CMOS sensors are the predominant technology used in VGR systems.

Parallel structured light

Unlike machine vision systems that are fixed in position and focused on a particular scene, vision systems designed to support robots face particular challenges with regard to illumination. Specifically, VGRs must be able to navigate dynamic tasks without being confounded by ambient light.

While the majority of machine vision vendors base their systems on LED light sources, VGR applications often call for laser-based structured light, which is very effective at suppressing ambient light by concentrating a great deal of energy within a small area to create high contrast at a narrow bandwidth of only a few nanometers. Applying an optical filter can further narrow this spectral range to suppress remaining ambient light.

Mosaic-shutter CMOS technology offers a potential source of significantly richer image data for AI systems used in robotic guidance applications. It can also facilitate the deployment of AI directly in the vision device, obviating the need to use an external computer or industrial PC to process it. Data processing, recognition, localization, or segmentation of images based on 3D information could thus happen directly on the robotic device. Courtesy of Photoneo.

The traditional approach to 3D scanning using structured light does not allow the projection of multiple light patterns onto the scene in parallel because they would interfere with each other and result in an unreadable pattern. Therefore, conventional structured light approaches project the individual patterns sequentially, one after another. Rather than creating the structured light patterns in the projection domain, it is also possible to generate them on the other side — in the camera sensor. This is the core idea of a new approach called parallel structured light that allows the creation of multiple patterns simultaneously from a single exposure. The approach, however, required a significant redesign of the standard CMOS sensor architecture.

CMC Electronics - Advanced Near-Infrared 2024 MR

Mosaic-shutter sensors

Conventional CMOS technology enables one exposure at a time, which means that the electronic shutter opens, exposes the sensor to the light to capture one image, and then closes. Such sensors are called single-tap sensors and they are widely used in traditional structured light systems. Although they provide submillimeter resolution and high accuracy, they must pause to capture each image, which requires the scene to be static during the acquisition of multiple frames.

The trade-off between quality and speed in state-of-the-art 3D sensors prompted the development of an approach based on a novel CMOS sensor that leverages a unique multi-tap shutter with a mosaic pixel pattern. The sensor is divided into super-pixel blocks, in which each pixel (or a group of pixels) can be modulated by a different shutter sequence to produce a specific structured light pattern — a process called temporal multiplexing. Courtesy of Photoneo.

Besides single-tap sensors, there are also multi-tap sensors, commonly used in time-of-flight cameras, which can capture multiple images with various shutter timings in a single exposure window. This capability is enabled by a special pixel design that allows programming of the exposure shutter modulation of sensor pixels. However, all pixels are modulated in parallel, which means that they all behave in the same way. Although time-of-flight cameras are very fast and provide near real-time performance, they fall short in their ability to deliver high detail at moderate noise levels.

The trade-off between quality and speed in state-of-the-art 3D sensors could not be addressed with these traditional technologies and standard CMOS pixel architectures, which prompted the development of an approach based on the parallel structured light concept.

The method uses the structured light principle in combination with a special, proprietary CMOS sensor that implements a unique pixel mosaic supporting a programmable multi-tap shutter. The sensor is divided into super-pixel blocks, where each pixel (or a group of pixels) can be modulated by a different shutter sequence to produce a specific structured light pattern — a process called temporal multiplexing. In other words, while conventional structured light systems modulate pixels in the spatial projection domain using a pattern projection, the parallel structured light technology implements the modulation directly on the sensor side in the temporal domain within a single exposure window.

The way the mosaic-shutter sensor works can be compared to the Bayer filter mosaic used in color imaging, where each color-coded pixel has a unique role in the final, debayered color output. Here, the raw sensor data from one frame gets “demosaiced” at the end of the exposure window in order to gather a set of images representing unique shutter sequences.

Thus, the major difference between traditional multi-tap CMOS sensors and mosaic-shutter CMOS sensors is that in the former case all pixels are modulated the same way, while in the latter case they are modulated individually according to the pixel-mosaic code.

This approach enables the imager to construct multiple structured light patterns in parallel in a single exposure. As a result, the parallel structured light approach enables 3D area scanning and high-resolution reconstruction of objects that are moving up to 144 km/h (89 mph), with accuracy and without motion artifacts.

Potential developments

Parallel structured light technology presents a method that extends robotic vision capabilities and opens the door to fundamentally new VGR use cases. For instance, applications that require hand-eye coordination can now achieve faster cycle times because a robot does not need to stop its arm movement to make a scan. Collaborative robots can also benefit from the technology because it effectively eliminates the effects of vibrations caused by the movement of the human hand. Superior quality control, inspection, measurements, and picking and sorting of objects moving on a conveyor belt also become more possible. Further, the technology allows the scanning of crops for phenotyping purposes, or the monitoring of livestock health.

The mosaic-shutter CMOS sensors ultimately put an end to the trade-off between scanning speed and accuracy. Though the technology has emerged only recently, it offers a potential source of significantly richer image data for artificial intelligence systems. AI is only as good as the data it uses, and because the technology enables the generation of a large amount of high-quality real data, it may navigate a new revolution in intelligent robotics.

Edge computing is another promising area to which the mosaic-shutter CMOS sensors and parallel structured light could contribute. The imaging technology could facilitate the deployment of AI directly in the vision device, obviating the need to use an external computer or industrial PC to process it. Data processing, recognition, localization, or segmentation of images based on 3D information could thus happen directly on the robotic device.

Meet the authors

Svorad Stolc, Ph.D., is CTO of the sensors division at Photoneo. He is an expert in machine vision, artificial intelligence, and parallel computing and has published a number of internationally acclaimed scientific articles; email: [email protected].

Andrea Pufflerova is public relations specialist at Photoneo. She has a master’s degree from the University of Vienna; email: [email protected].

About Photoneo SRO