Embedded Vision Systems Usher Deep Learning into the Imaging Domain

MICHAEL EISENSTEIN, CONTRIBUTING EDITOR

How would one decide which potato to select out of a group? Most people have an intuitive sense that guides them when selecting a spud, for example, at the market or from the garden, but it becomes difficult to pen an authoritative description of what differentiates healthy tubers from their underripe or infected counterparts.

Such tasks pose a significant challenge in the agricultural world, where food safety guidelines and quality standards extend from the farm to processing facilities and ultimately to retailers. Computer-guided machine vision systems are deployed in a wide range of industries, and most of these systems rely on rules-based algorithms that provide concrete guardrails for assessing and classifying parts and products. But these rules can be very hard to comprehensively define for organic products. This makes it challenging to implement machine vision for large-scale farming or produce-processing efforts.

Deep learning offers a more versatile and flexible alternative to rules-based algorithms for many tasks, such as assembly line inspections. Courtesy of Zebra Technologies.

“Sometimes they’re dirty, and sometimes they’re misshapen, and how do you differentiate a potato from a rock?” said Jeff Bier, president and cofounder of engineering consultancy firm BDTI. “That’s where you really need deep learning.”

Deep learning describes a subset of AI methods in which algorithms are trained to recognize hidden patterns in data. In the machine vision world, this means algorithms are initially trained on carefully chosen, labeled images that depict — in the case of the aforementioned example — both consumer-grade and low-quality potatoes as well as non-potato objects. This yields a model that allows the algorithm to interpret new data based on complex and subtle features that it discerned from the training data set.

Embedded vision systems are increasingly benefiting from deep learning capabilities. In this inspection scenario, a camera system focuses on a product (bottom). The machine vision system uses deep learning to analyze the resulting image and identify a defect. Hundreds of hours of training on ‘good’ and ‘bad’ examples enable the system to deliver an accurate assessment of inspected parts or products. Courtesy of Cognex.

These algorithms have proved to be an empowering force for machine vision, as deep learning can imbue cameras and sensors with expertise that would be impossible or difficult to encode. It can also offer a more efficient alternative for tasks that are amenable to rules-based analysis, for example, analyzing packages or machine parts on an assembly line. Eric Hershberger, applications engineering manager at Cognex, recalls laboring over a standard algorithm for optical character recognition (OCR), in which a camera is used to read and interpret written text. “It literally took me two months of programming to make that system work,” Hershberger said. Using deep learning, it took just a few hours of effort to achieve the same task, and with superior performance.

Increasingly, these capabilities are being directly integrated into the cameras themselves, enabling users to achieve remarkably sophisticated image processing in a single, compact package. While this democratization of deep learning holds transformative potential for numerous industries, manufacturers are still working out how to make this technology as accessible as possible. “They want an easy solution, but very powerful — this is the main expectation from the customers,” says Gerard Ester, manager of vision solutions at SICK, a developer of sensor solutions supporting a wide range of industrial applications.

All aboard

The idea of bringing image analysis capabilities off of computers and into cameras is not new; according to Hershberger, such so-called embedded vision systems began to enter the market in the late 1990s. Embedded vision systems greatly reduce the hardware and equipment required for a machine vision setup and mitigate the security risks associated with, for example, having all the image data on a factory floor flow into a single PC for analysis.

In the last quarter century, these systems have become increasingly compact, resulting in machine vision systems being routinely deployed for consumer use — and not exclusively in industrial environments. The same underlying concepts are being applied to help cellphone users take better pictures and in vehicles to support driver safety features.

By 2010, Bier had become so enthusiastic about the technology that he founded a dedicated industry organization. “We realized this is going to be a game changer — the ability to embed computer vision in everyday devices,” Bier said. The organization, founded as the Embedded Vision Alliance and since renamed the Edge AI and Vision Alliance, is well known as the host of the annual Embedded Vision Summit.

Amid the ascent of embedded vision technology, however, engineers and systems designers did not immediately overcome the logistical difficulty of cramming the sophisticated graphics processing units (GPUs) that power AI applications into compact systems. These mighty processors require considerable power as well as strategies for controlling the heat that they produce as by-products. Until recently, virtually all embedded vision devices relied on rules-based algorithms. Most of the early implementations of deep learning in machine vision systems were based on more traditional designs, in which image data from “dumb” cameras is relayed to a centralized PC or cloud-based system for GPU-powered analysis.

Deep learning capabilities are increasingly integrated into imaging systems themselves, enabling sophisticated image processing in a compact package. Courtesy of IDS Imaging.

In the past decade, the rapid evolution of smartphones and other Internet of Things devices has been a major boon, motivating chip developers to find creative strategies to shrink their processors. “There has been this huge surge of R&D investment over the last 10 years … to create these domain-specific architectures that are 20, 50, 100 times more efficient at running these deep learning models,” Bier said. This new generation of GPUs, developed by the likes of Sony, Qualcomm, Intel, and NVIDIA, reduce both the financial and energy requirements needed for AI computers. At the same time, they are also more manageable in terms of controlling heat dissipation. Patrick Schick, product owner at industrial camera developer IDS Imaging Development Systems, said that technology for accelerating deep learning processing is now available as system-on-a-chip (SoC) devices, in which all the core components of a computer have essentially been shrunk down to a tiny camera-scale form factor. This leads to even greater processing efficiency.

And critically, this integration minimally affects the camera design itself, such that embedded vision system developers are generally able to avoid a return to the design drawing board. For example, SICK launched its first deep learning-powered system, the Inspector 83x, in June. Aside from the product’s deep learning element, Ester said that these systems use the same fundamental components, including the camera optics and lighting components, as in the company’s other embedded vision offerings.

Deep learning can be intimidating for customers without coding experience, and user-friendly software has been critical to the broad commercialization of these platforms. Software companies, such as MVTec, specialize in deep learning for machine vision, but many pioneers in this space have opted to internally develop their own toolboxes. For example, IDS offers users its Lighthouse suite as a streamlined tool for training its NXT family of deep learning embedded vision systems. “The basic idea behind this is that a customer or a user could come to us and only have images and say, ‘this is good; this is not good,’” Schick said. “They do not have to have an in-depth understanding of AI.”

From factories to farms

As barriers to adoption and costs continue to decrease, the application space for deep learning-powered machine vision has steadily grown. Some of these applications involve tasks that are relatively routine yet require a level of precision and repeatability that would be difficult or impossible for human workers. Deep learning also has the advantage of codifying expert knowledge without requiring humans to formally identify and define rules to guide the algorithm.

Most current implementations of deep learning fall into one of two broad categories. The first is classification, where embedded vision is used to sort through a steady procession of objects and determine which are ready for production and which are not — and why. Regarding one implementation, Andy Zosel, senior vice president for Advanced Data Capture, Machine Vision & Robotic Automation at Zebra Technologies, said that customers are using Zebra’s embedded vision systems on automotive assembly lines to inspect the connectors that couple electrical wiring harnesses in cars. “If they are not clipped together correctly, it’s going to vibrate and fall apart, and you’re going to start running into electrical issues,” Zosel said. “But verifying those as ‘clipped’ or ‘not clipped’ with traditional cameras is extremely difficult.” Deep learning makes it straightforward to classify assemblies as clipped or unclipped even when all the various components are black and difficult to distinguish in an image.

A deep learning-based machine vision system detects and indicates the presence of an abnormality. Courtesy of IDS Imaging.

The other major category of deep learning tasks is anomaly detection, in which the system is used to detect defects or damage. In such a use case, SICK’s cameras are being deployed to identify flaws in the packaging or loading of consumer products, such as bottles of detergent or food containers, or to inspect the equipment used for injection molding to ensure a steady flow of consistently manufactured plastic components.

A properly trained deep learning algorithm can also acquire a level of flexibility that is inaccessible to rules-based algorithms. “It’s better at handling environmental changes — so if it gets a bit brighter or darker, or if the object under inspection is now a bit further to the left or right, the AI doesn’t care,” Schick said.

A depiction of a real-time ‘good/no-good’ inspection for food products using SICK’s Inspector 83x camera. Courtesy of SICK.

This can be a major asset when embedded vision is used outside the controlled environment of a factory floor. One example, Bier said, involves the John Deere See and Spray system, which pairs machine vision and deep learning to guide the precise dispersal of agricultural chemicals. “It looks at each individual little plant and it says: ‘Is that a weed or is that a crop?’” Bier said. “If it’s a weed, it gets the herbicide, and if it’s a crop, it doesn’t.” In 2022, John Deere reported that this system was allowing farmers to cut their herbicide use by up to two-thirds.

Training day

Implementing deep learning is becoming considerably easier, but there are still important considerations and challenges that both users and camera manufacturers are continuing to grapple with.

One is the training process, in which the deep learning algorithm is fed examples that allow it to build a model for interpreting visual data. Sometimes, this can be a breeze. “If we’re just trying to find a couple differences, we can do a quick train with just a few images to figure out what’s good [and] what’s bad,” Hershberger said. A few days of testing with this rough model is often sufficient to achieve robust performance, he said.

Other applications require more extensive training to ensure that the algorithm reaches solid conclusions. “If you only train it on a few images, it’s very easy for it to get fooled,” Zosel said. An inadequately trained or undertrained system is apt to misinterpret data when the samples on which it is trained are insufficiently varied. As an example, Zosel cites a scenario in which the AI concludes that “every time I saw a Y in the corner, there was a defect, so that must mean a Y in the corner means there’s a defect.”

This can become a substantial burden. Some products have an inherently slow production process, such as aerospace components, and in such cases, Ester recommends that users assemble a stockpile of training examples before rolling out their deep learning system. Additionally, it can be tough to authoritatively detail all the possible flaws and failure modes for a product, Hershberger said. He said that his team spends a lot of its time developing and reviewing training procedures with quality control experts.

In some cases, embedded vision developers can formulate generalizable deep learning models, and these can essentially be run “out of the box.” Zebra specializes in OCR applications, and Zosel said that the company’s Aurora deep learning model can be run off a standard central processing unit while still delivering accurate text interpretation for English alphabet-based languages. Importantly, these capabilities can still be extended to other alphabets with additional training.

Trade-offs in terms of image quality also exist. Embedded vision GPUs are generally weaker and slower than their PC-scale counterparts, and this makes it difficult to work with large, high-resolution picture data.

The use of deep learning can compensate for poor image quality to some extent. But, experts caution against settling for less. “I really don’t like that argument at all,” Hershberger said. Better image quality improves the algorithm’s analytical performance and reduces the amount of effort required for training. Speed is also a common limitation with current processors, although a few embedded vision systems on the market are achieving the throughput needed for many areas of manufacturing. “We can do this kind of AI inspection in tens of milliseconds,” Ester said, regarding SICK’s Inspector 83x camera.

Embedded vision systems can also gain a speed and efficiency boost by combining the strengths of rules-based and deep learning approaches. “If you can find the object and locate it in some segment using traditional algorithms and then pass that found object to the AI … you can kind of balance speed and requirements,” Zosel said.

A vision for the future

At present, a relatively limited number of deep learning-enabled offerings exist in the embedded vision space. But Bier envisions that the availability of such systems on the market will evolve rapidly, as will their performance capabilities. “If somebody looked at an application two or three years ago and it wasn’t feasible, it might be feasible today or a year from now,” Bier said.

3D imaging is one example of a capability that was previously out of reach. Multiple companies released 3D embedded vision systems enhanced by deep learning capabilities in the past year, including Zebra’s 3S Series of sensors, as well as the In-Sight L38 from Cognex.

“Finding random defects in a 3D space is something that we’ve needed for a very long time,” Hershberger said, highlighting that differences in surface reflection in 3D can reveal scratches and flaws that would escape notice in a 2D image. But this is still evolving technology and introduces additional challenges. For one, these cameras can be much harder to set up to achieve consistent and accurate imaging. They may also require significantly more training than 2D systems.

As progress marches on, the emergence of ever faster and more energy-efficient GPUs — and their more advanced cousins, the neural processing units — could equip the next generation of embedded vision systems with the speed, resolution, and accuracy needed to start completing tasks that were once the sole domain of human experts.

“It opens up a whole set of applications,” Zosel said. “But it’s definitely more complex than just saying, ‘unplug the human, plug in the camera.’”