For industrial applications, 3D vision systems have been improved in recent years, designed with greater inspection capabilities and enhanced performance in dynamic range, frame rate, and image quality. This increased capacity has resulted from upgrades in components and image processing software. A solder joint on a circuit board undergoing 3D vision inspection using colored illumination to indicate flat areas (red), moderate-angle areas (green), and steep-angle areas (blue). The solder joint highlighted by the box is defective. Courtesy of Omron. On the hardware side, CMOS sensor refinements have helped to produce consistent images in diverse contexts while global shutter — where all pixels on the sensor are exposed to light simultaneously — eliminates artifacts, resulting in precise depth measurements in high-speed settings and other challenging conditions. And from the perspective of software, AI has improved point cloud creation and enabled adaptive systems that propel robotic guidance, inspection, and automation. The next step in the technology’s evolution is deep AI integration for enhanced system recognition and learning, which will benefit 3D vision applications by further augmenting performance. Reaching this point in an industrial setting, though, will require innovations that reduce the training burden and expand use cases. 3D vision applications 3D vision applications rely on 3D point clouds to be captured on just about any object, part, and material, according to John Leonard, product marketing manager for Zivid. But capturing a 3D point cloud — a set of data points in a 3D space — that faithfully depicts a target is traditionally difficult, depending on the nature of the subject. A shiny object, for example, produces reflection artifacts that can confuse a 3D camera. Transparency also hinders a system’s ability to accurately determine an object’s 3D shape. The best 3D cameras currently handle both situations, something that was not possible five years ago, Leonard said. In Zivid’s case, the company makes 5-MP 3D color cameras that use structured light. The cameras project a light pattern onto a target and determine its 3D coordinates by the distortions in the pattern, a process that requires capturing the reflected light using 2D sensors. AI-assisted processing cleans up images and removes noisy or poor data. For its part, Omron uses another 3D vision inspection approach. The company produces a variety of electronic products that have circuit boards with numerous chips installed. These devices and other components connect to the board through solder joints. The quality of the joints, which often number in the thousands, is critical. This drove Omron to create a 3D solder joint inspection solution, which it sells and uses during its own production process. “We design and manufacture a significant amount of electronics, and we needed to inspect our products,” said Nick Fieldhouse, product manager for Omron’s Inspection Systems Division. Bags in a box, captured by one camera in a dual stereo vision system (top). An AI evaluation result from combining left and right images in a stereo vision system. Four bags were detected, with grasp points labeled .0 to .3 (bottom). Courtesy of Basler. The setup uses light projected from multiple directions to extract 3D solder joint data. The main approach uses red, green, and blue sources. The red shines down from above while the green and blue illuminate at an angle, with the blue at a steeper angle than the green. The result is a color-coded surface, with flat areas in red, moderate angle areas in green, and steep areas in blue. Then, system operators pair this color map with a structured light approach using several Moiré patterns — interference patterns that are overlaid but offset. The lines from the pattern bunch tightly together on steep surfaces and spread out over flat ones. This method yields height information, complementing the color technique to provide accurate 3D results for solder joints and components. Omron’s current inspection system incorporates five cameras, each with a 12- to 25-MP resolution. In addition to looking for solder joint parameters of wetting angle and coverage, the inspection process enables the features of components to be examined. Improved camera resolution over time has made faster processing and higher throughput possible, Fieldhouse said. Sensors and cameras Meanwhile, Martin Gramatke, product manager for 3D Product Systems at Basler, said that sensors for 3D vision have improved during the past few years through the rollout of iterations with higher dynamic range and faster frame rates, advancements that ensure reliable image quality under varying conditions. They also offer global shutter technology, an approach that eliminates motion artifacts and enables accurate depth measurements, even in high-speed applications. Basler offers two different 3D vision methodologies in its camera systems. One is a decade-old time-of-flight approach, with the distance to a point on a target determined by how long it takes modulated light to travel there and back. The resolution is 640 × 480 pixels in xy and a millimeter in z. The second technology, introduced in 2024, uses stereo vision, created by mounting two Basler Ace cameras along a baseline. The 3D data is assembled by combining the separately collected left and right 2D images. At 75 fps, these cameras offer resolutions of 5 MP. Thus, 3D rendering could yield a torrent of useful data for inspection. A robot uses a 3D vision system (black camera mounted above the robot arm) during manufacturing and assembly. Courtesy of Zivid. However, a 3D vision system does not need to overwhelm the user with data that needs to be processed, Gramatke said. “With on-sensor processing, 3D cameras can preprocess data before transmission, reducing computational load and improving response times.” Regarding the technical innovations that have produced such in-camera processing capabilities, Gramatke pointed to the incorporation of neural processing units on the sensors themselves. Sony, for example, constructed its components with a stacked configuration, with a pixel chip atop a logic chip. The former captures the data from a scene while the latter contains the circuity necessary to process the data and the memory to store the AI model used for the processing. This approach eliminates the need for an off-chip high-performance processor or external memory, a reduction in component count useful for an edge AI system found in an industrial setting. Incorporating AI To see how AI-enabled technology aids 3D vision in an industrial setting, consider the pick-and-place of chip- or other food-containing bags randomly arranged in a bin. A traditional rules-based approach could find the edges of the bags, enabling the system to pick them up with a suction-powered grabber. A rules-only method, though, may not realize that two edges are from the same object. Thus, the pick may fail because one bag partially covers another. In contrast, an AI-based approach that is trained to recognize what is and is not a bag, avoids this potential issue. The system finds the edges as it distinguishes the bags from each other and the surroundings. The result is a more accurate and complete understanding of the 3D space and the relationship of the contents within the bin to each other and the container. This results in a more efficient and effective pick sequence with a greater likelihood of success. A bin of parts (top) and their 3D poses (bottom) extracted from 2D images. With this 3D information, a robot can perform pick-and-place as part of automated processing. Courtesy of MVTec. One challenge facing 3D vision users who want to incorporate AI into their pick-and-place solutions, or other automation applications, is training the system for a repeatable desirable outcome. With a wide enough array of images, users can develop an AI model that recognizes components in any orientation and instructs a robot on the right grip point and grip sequence. Providing an adequate number of images can be difficult, since the training set should include variations in background, lighting type, and angle, as well as part pose, size, transparency, texture, and color. Raised identification may occur on the part that is visible in certain poses but not others. As a result, acquiring the right training set can be among the most difficult aspects of deploying an AI solution for inspection purposes. In a newly released version of its flagship HALCON product, machine vision software supplier MVTec said that it has a way around this problem — at least for rigid parts that are of the same size. Bertram Drost, principal research engineer, and Andreas Zeiler, product owner specializing in code reading and 3D, outlined the method in a company-sponsored presentation earlier this year. The process began with a computer-aided design file and a mathematical description of the part, combined with data on the surface and material characteristics. From this information, MVTec generated a synthetic, randomized scene that varied textures, camera positions, lighting, perspectives, and clutter objects. Part of this scene generation involved simulated dropping of the objects, with their orientation and location being scrambled as a result. The software automatically rendered the scene to match how it would appear in real life. Variations in training Using this approach, MVTec initially created ~50,000 training images. However, testing showed that the resulting models did not work effectively in practice. Improving the model required extra steps in generating the training set. “We add some blurring,” Dorst said. “It’s a randomized blurring [with] different types of blurring. It can be a Gaussian blur or a motion blur.” Such a variation strategy helps in training, because MVTec’s 3D matching depends on multiple 2D camera views within their vision systems. These views can be from images taken at the same time by two or more cameras. The images also could be multiple shots from one camera as the part passes on a conveyor belt, or from one camera that moves around the part at the end of a robotic arm. The software extracts data from these 2D images, with this information refined further to generate the final pose solution. The resulting 3D image is within about a percent of the object diameter, Dorst said. In testing, identifying five parts using four cameras took 200 ms, fast enough to be used during robot movement or to inform the system when the robot is busy. In such a scenario, Zeiler concluded that end users gain an advantage from the combination of deep learning and rules-based methods. “That provides them a robust and fast solution for bin-picking applications while still being able to use cost-efficient 2D cameras.” Looking to the future As these 3D vision systems mature, Omron’s Fieldhouse predicts continued improvement in the company’s AI tools to help guide them. These advancements will allow even more accurate results and reduce the need for customers to rely on their own engineering resources. Zivid’s Leonard said that AI continues to be incorporated into the information processing chain used in 3D vision. It is also integral to the foundational models that promise to make robots’ approach to tasks similar to that of human inspectors. Thus, a robot running a foundation model would potentially peer into a box with 3D vision, see a tangle of parts, and figure out how to extract them as a person would. Such AI trends are just beginning to have an effect in 3D vision, but Leonard said that they could fundamentally disrupt the way that these systems are built going forward.