A worldwide pandemic that upended social interactions and economic activity put retailers to the test. In an environment where profit margins are always thin, retailers must meet customer expectations about product availability and service. They must also contend with shrinkage, or merchandise disappearing from the shelves due to theft or customers accidentally walking out with product without paying. An artist’s depiction of the food market of the retail environment of tomorrow, in which AI-enabled embedded cameras monitor inventory. Courtesy of Store shelves: iStock.com/S-S-S; camera: iStock.com/ChubarovY. Even before COVID-19, retailers were turning to vision technology for better inventory management, theft prevention, and productivity improvements. The pandemic accelerated the trend. But the rollout has not been without problems, which ongoing advancements in vision, computing, and algorithms could help solve. The deployment of vision got a boost when retailing giant Amazon launched its first cashierless Amazon Go store for its own employees at the end of 2016. Shortly afterward, the company opened the first of the stores to the general public. Amazon’s cashierless technology tracks people’s purchases using vision, following shoppers as they pick up items, set them down again, or put them in a basket. While Amazon closely guards details about its technology, firsthand accounts reveal that a typical store of a few thousand square feet has a hundred or so ceiling-mounted cameras surveying the entire area. Most are standard color cameras with custom processing boards, while the rest are time-of-flight depth-sensing cameras. The vision data is augmented by weight sensors mounted in the shelves. Customers enter the store after scanning a QR (quick response) barcode displayed by an app on their smartphones, which identifies their account and connects to their payment method. Using AI classification models that run mostly on a server, the system determines the items purchased and then bills the customer, all without cashier interaction. The company releases no data about system performance, but television reporters on test-buying runs have accidentally walked out of stores with small items, such as yogurt cups, without paying. Amazon announced in March that it would sell its Just Walk Out technology to other retailers. Cameras and edge computing mounted in rails near the overhead lights enable retail customers to check out without visiting a cashier or a self-checkout station. The technology also simplifies inventory control and reduces losses. Courtesy of Store GetGo and Grabango. “Amazon Go obviously touched off a really interesting conversation for retailers about the future,” said Rugwed Phatak, senior director of marketing for GetGo Café + Market, a Pittsburgh-based convenience store chain with 267 locations in the Midwest. GetGo is focused on serving sandwiches, pizzas, and other fresh food to consumers, as well as supplying more traditional convenience store products and services. The company is interested in technology that automates operations, cuts cost, improves efficiency, and allows more staff to be deployed to a store’s kitchen. “First and foremost, technology like this has to work for the consumer,” Phatak said. Monitoring inventory The chain is testing vision-based technology that monitors inventory and/or enables consumers to simply pick up items and go, he said. One test involves technology from Grabango of Berkeley, Calif. Another trial uses products and services from Chicago-based Cooler Screens. Grabango’s approach is similar to Amazon’s Just Walk Out technology and enables a touchless and cashier-free checkout experience. It also saves time. When a store is busy, a shopper using traditional methods may spend ~45 s checking out. In contrast, using the automated Grabango approach, people exiting the store need only scan a smartphone app that links to an account, a process that takes 1.3 s, according to a company study. Grabango’s solution begins with fixtures that the company calls G-rails, which contain both cameras and processors. G-rails are hung up by the team that typically hangs light fixtures in the store. “They just pop up and connect to the ceiling. And you don’t need a whole squad of engineers out there to fiddle with the optics to get the system recognizing products accurately,” said CTO Ryan Smith. By design, the system doesn’t read product barcodes. The cameras provide a color image of product with a resolution roughly equal to that of the human eye at a distance of greater than 8 ft, Smith said. Because these vision systems go in as a retrofit to existing stores, he said that lighting conditions can be variable. Positioning the cameras to optimize the image can also be difficult or impossible. Finally, the system uses only vision sensors, part of an approach that seeks to minimize installation time and any changes to normal store operations. Camera coverage blankets the entire store, capturing images over an area up to 5000 sq ft for a convenience store and possibly hundreds of thousands of feet for a larger retailer. Neural nets, which Smith said are a key enabling technology, analyze the images captured by the cameras, while doing a good job at handling generalized variation. This capability allows the compute platform in the rails to deal with less-than-perfect lighting and non-ideal camera placement. The system can also accommodate the introduction of new products. It does take some training, though, for the system to update its models to recognize each type of product from a changing mix. As installed, the model successfully distinguishes between when people pick an item up or place it back on the shelf and when they put it in a basket. Training improves the system’s accuracy, a parameter that Grabango always monitors. The refund rate, a proxy for the system’s accuracy, runs at 0.03% of total dollar volume in a convenience store setting. Thus, for the most part, the system correctly identifies the items that shoppers actually meant to purchase. Powerful processing Advancements in edge computing have made it possible to place processors in the G-rails that possess the necessary horsepower to do at least some of the classification. In addition to the computing happening in the G-rails, the system includes a small server located in the store itself that aggregates data and acts as the connection point to the cloud. Neural nets are good at handling generalized variation, enabling the compute platform in the rails to deal with less-than-perfect lighting and nonideal camera placement. The system can also accommodate the introduction of new products. One of the advantages of this technology for retailers is reduced shrinkage. Typically, retailers experience a loss of 1% to 3% because goods are stolen, consumed, or accidentally pocketed. With Grabango’s technology, this figure is reduced threefold or more, said Smith. These savings add to the bottom line. The data on items sold goes to the store’s inventory control management system and gets tallied on a receipt for the customer. Smith said Grabango is in a scale-up phase and that the data generated by the system is likely to have other uses in the future. The other vision technology being evaluated by GetGo is supplied by Cooler Screens. As the name implies, the company’s screens transform the usual see-through door on grocery coolers into a display screen that shows digital advertisements when no one is nearby. However, when people are in front of the case, a sensor detects their proximity, and the screen then displays the products that lie within the cooler. Vision sensors play a key role in the inventory management aspect of Cooler Screens, said Andrew Lipsman, vice president of marketing. “The technology is located on the interior of the cooler door and uses proprietary AI to determine whether a particular product is in stock or out of stock on the shelf at any given time.” This information, in turn, can help to ensure that shelves are not empty — for example, by alerting staff when restocking is needed. The data can also play a role in making sure replacement product is ordered. While other firms install their own vision sensors to provide services to retailers, solutions from Singapore-based Trax Image Recognition work with images from smartphones, IoT shelf-edge cameras, dome cameras, autonomous robots, and other sources. The only requirement is that the imaging resolution be equivalent to what a human can see from a distance of about 1 m, or just over 3 ft. Inexpensive cameras, combined with automated image analysis, can monitor inventory on a retail shelf, provide data about product availability, and help to ensure a smooth flow through the product supply chain. Courtesy of Trax. “Trax’s computer vision solutions are powered by proprietary, fine-grained image recognition algorithms, with more than 10 years of constant machine learning, enabling us to digitize the physical world of retail,” said Mark Cook, executive vice president and head of retailer solutions at the company. He said Trax’s mature IoT system provides the ability to control and monitor hundreds of imaging devices per retailer. Many of the devices are low-cost vision sensors similar to those used in the mobile and automobile industries. After a customer selects a product from a cooler, Cooler Screens’ vision-based AI technology determines whether the inventory is out of stock. Courtesy of Cooler Screens. The company’s fine-grained algorithms arise, in part, because Trax maintains a proprietary database of nearly 12 billion images of products found on retailers’ shelves. The database and deep learning technology allow the system to detect near- identical or multiple products despite odd angles, poor lighting, or other issues that hinder identification. The accuracy of detection is high, and new images are constantly being added and adjustments are being made to maintain the level of performance, said Cook. The vision-derived data is used to determine the inventory that is actually available on the shelf at any given time. The live information helps demand planners keep products on the shelf, and it can also aid online shoppers who are looking for a specific item or brand. Retailers need accurate, actionable, and frequent inventory data, attributes that a traditional manual approach to tracking often fails to supply. In contrast, an automated approach can allow retailers to understand changes to inventory in near real time, enabling them to make adjustments along the supply chain as market conditions shift, Cook said. An artist’s rendition of the grocery shopping experience of tomorrow, where cameras track inventory and human cashiers are not required. Courtesy of iStock.com/Hugethank9. Despite progress, vision-based technology used by retailers is still maturing and improving. GetGo’s Phatak said that an electronic receipt from a cashierless shopping trip takes time to get to a customer. While this latency is steadily growing shorter, he would like to see receipt delivery occur in near real time. Getting to this point requires faster processing steps, a goal that could be helped along via vision advancements such as higher-resolution cameras, additional spectral data, or increased spatial information in the form of a 3D point cloud, as well as improved processors and/or algorithms to better analyze image data. Deploying these systems is also challenging and may require installation and training time. These steps cannot be completed quickly, particularly if a retailer has many locations. Deloitte Digital Spain, part of the multinational professional services network Deloitte, is acting as a systems integrator and using Trax technology to bring solutions to the retail sector, said director of retail Ignacio Moreno. Machine vision and AI are central to this effort, which could lessen the 4.1% revenue losses that retailers suffer due to misplaced products, empty shelves, and in-store operational inefficiencies. “This technology is part of the future of grocery retail digitization, and not only for the processes that occur in stores. It will also be important to interact with the customer and offer them a great shopping experience,” Moreno said. The technology will generate large amounts of information. Beyond enabling improved inventory control and eliminating checkout lines, he said, the data could eventually ensure the highest possible shopper satisfaction — for instance, by helping stores to forecast demand and keep shelves stocked. Exactly how the information may be used will be discovered through technology deployment, investigation, and experimentation. Phatak is looking forward to having these vision technologies installed in multiple stores, in Pittsburgh initially, which will allow the company to gather widespread customer feedback. Usage across more stores will also lead to a better understanding of the ultimate payoff of using the technology — in terms of reduced shrinkage, better allocation of labor, improved inventory control, customer satisfaction, or other benefits. “You don’t really get to stress-test the hypothesis of the technology and the business case until you drive scale,” Phatak said. He likened the technology’s present stage of development to a baby learning to walk. “We’re very much in a crawl, walk, run situation,” he said. “So, we’re crawling, and we’re about ready to stand up.”