AI Method Enables Controllable Image Synthesis

A method that controls how AI systems create images has applications for autonomous robotics and AI training. The method builds on the techniques (or AI task) of conditional image generation to give users more control over the resulting images and layout.

Developed by researchers at North Carolina State University (NC State), the new technique specifically trains the AI system to control certain image characteristics across a series of pictures that may show movement or other changes.

In conditional image generation, the AI system is trained to create images that meet a specific set of conditions — for example, the system could be trained to create original images of cats or dogs, depending on which animal the user requested. Advances to this technique allow the AI system to be trained to meet conditions that specify image layout — for example, where a tree image should be placed on the screen.

In new work, the NC State team has taken these techniques even further to give users control of image synthesis from reconfigurable structured inputs.

“Like previous approaches, ours allows users to have the system generate an image based on a specific set of conditions. But ours also allows you to retain that image and add to it,” professor Tianfu Wu said. “For example, users could have the AI create a mountain scene. The users could then have the system add skiers to that scene. Our approach is highly reconfigurable.”

The new AI method enables the system to create and retain a background image, while also creating figures that are consistent from picture to picture, but show change or movement. Courtesy of North Carolina State University.

The NC State technique allows users to train the AI system to manipulate specific image characteristics so that the images retain their identity even if they move or otherwise change. For example, the AI system could create a series of images showing the same skiers turn toward the viewer as they move across the landscape.

CMC Electronics - Advanced Near-Infrared 2024 MR

The researchers created a model for the task of layout-to-mask-to-image, and modeled how the AI system would learn to unfold object masks in a weakly supervised way based on an input layout and object style codes. To ensure a strong connection between the input layout and synthesized images, the researchers connected the layout-to-mask component with layers deep in the generator network.

The researchers then created a method for their proposed layout-to-mask-to-image synthesis based on generative adversarial networks (GANs). The method allows for layout and style control at both the image and object levels. The researchers introduced an instance-sensitive and layout-aware normalization (ISLA-Norm) scheme for ensuring controllability.

The team tested its new approach using the COCO-stuff data set and the visual genome data set. Based on standard measures of image quality, the new approach outperformed previous image creation techniques. Although the researchers required a 4-GPU workstation for training the AI system, they said that deploying the system is less computationally expensive. “We found that one GPU gives you almost real-time speed,” Wu said.

One application for the method, the researchers said, could be to help autonomous robots “imagine” what the end result might look like before they begin an assigned task. “You could also use the system to generate images for AI training. So, instead of compiling images from external sources, you could use this system to create images for training other AI systems,” Wu said.

Next, the researchers plan to work on extending their approach to be applicable for video and 3D images. They have made the source code for their method available on GitHub. “We’re always open to collaborating with industry partners,” Wu said.

The research was published in IEEE Transactions on Pattern Analysis and Machine Intelligence (www.ieeexplore.ieee.org/document/9427066).

There are 731 suppliers of Sensors & Detectors in the Photonics Marketplace.

Published: June 2021

Glossary

artificial intelligence: The ability of a machine to perform certain complex functions normally associated with human intelligence, such as judgment, pattern recognition, understanding, learning, planning, and problem solving.
deep learning: Deep learning is a subset of machine learning that involves the use of artificial neural networks to model and solve complex problems. The term "deep" in deep learning refers to the use of deep neural networks, which are neural networks with multiple layers (deep architectures). These networks, often called deep neural networks or deep neural architectures, have the ability to automatically learn hierarchical representations of data. Key concepts and components of deep learning include: ...
machine vision: Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...
computer vision: Computer vision enables computers to interpret and make decisions based on visual data, such as images and videos. It involves the development of algorithms, techniques, and systems that enable machines to gain an understanding of the visual world, similar to how humans perceive and interpret visual information. Key aspects and tasks within computer vision include: Image recognition: Identifying and categorizing objects, scenes, or patterns within images. This involves training...

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.