Researchers from George Washington University (GWU) have introduced a strategy for training deep neural networks the benefits of using photonics to execute artificial intelligence (AI) applications. The team’s CMOS-compatible silicon photonics architecture enabled parallel, ultrafast on-chip training of neural networks with low energy consumption. According to professor Volker Sorger, the hardware that keys the advancement will accelerate the speed of training of machine learning systems, and harness the best of what both photonics and electronic chips offer integrated systems. Photonic integrated circuits (PICs) have shown the potential to deliver high computing performance, as measured by the number of operations they can perform per-second per-watt. However, although PICs have shown they can improve core operations in machine intelligence used for data classification, photonic chips have yet to improve the actual front-end learning and machine training process. “The training of AI systems costs a significant amount of energy and carbon footprint,” said professor Bhavin Shastri. “For example, a single AI transformer takes about five times as much CO2 in electricity as a gasoline car spends in its lifetime. Our training on photonic chips will help to reduce this overhead.” The researchers implemented on-chip neural network training using a direct feedback alignment (DFA) training algorithm — an approach that used error feedback, rather than error back-propagation to train the neural network. The DFA algorithm can operate at speeds of trillions of multiply-accumulate (MAC) operations per second, while consuming less than 1 pJ per MAC operation. In addition, unlike back-propagation, the DFA algorithm does not require network layers to be updated sequentially during the backward pass. All the network layers can be updated in parallel, as the same error is propagated to each layer. A multi-institution research team developed a silicon photonic architecture for training deep neural networks using direct feedback alignment. The advancement represents an acceleration for AI hardware and supports the implementation of ultrafast and highly efficient analog neural networks based on an optical platform. Courtesy of GWU. The photonic architecture exploits parallelized matrix-vector multiplications using arrays of micro-ring resonators for processing multichannel analog signals along single waveguide buses to calculate the gradient vector for each neural network layer in situ. The gradient vector computation is performed in a single time step by incorporating wavelength division multiplexing techniques to process multichannel analog signals along the same waveguide bus. With this, the architecture exploits the speed and efficiency of photonics to determine the gradient vector for each hidden layer in a single operational cycle. In situ training takes place directly on the PIC, which is able to account for nonidealities in the analog hardware, making in situ training with DFA robust to noise. In situ training with photonic hardware can also eliminate the need for optical-to-electronic conversions, since it can support training with data signals originally generated in the optical domain. The photonic architecture is also highly scalable for training neural networks of variable sizes. In experiments, the researchers demonstrated deep neural network training with the MNIST data set, using on-chip MAC operation results. The DFA training algorithm performed well in the simulation, even with added noise during calculation of the gradient vector. Although the researchers focused on the use of photonics for the implementation of the DFA algorithm’s backward pass, inference could be performed using a similar photonic architecture, they said. The researchers believe that the high throughput and low latency offered by PICs could enable the implementation of ultrafast and highly efficient analog neural networks. “It is a major leap forward for AI hardware acceleration,” Sorger said. “These are the kinds of advancements we need in the semiconductor industry as underscored by the recently passed CHIPS Act.” The expected improvements in training time and energy efficiency offered by the photonics platform could enable the development of innovative neural network applications that cannot operate on current-generation hardware, the researchers said. In future work, the team plans to demonstrate a complete, integrated system with a dedicated CMOS processor capable of operating at high speeds for neural network training, without requiring any data processing off-chip. The research was published in Optica (www.doi.org/10.1364/OPTICA.475493).