U.S. Army researchers have developed a reinforcement learning approach that will allow swarms of unmanned aerial and ground vehicles more consistent performance when executing mission objectives. Reinforcement learning provides a way to control uncertain agents to achieve multi-objective goals when the precise model for the agent is unavailable. However, existing reinforcement learning methods can only be applied in a centralized manner, which requires pooling the state information of the entire swarm at a central learner, which drastically increases computational complexity and communication requirements, resulting in unreasonable learning time, Jermin George of the U.S. Army Combat Capabilities Development Command’s Army Research Lab said. A small unmanned Clearpath Husky robot, which was used by Army Research Lab researchers to develop a new technique to quickly teach robots novel traversal behaviors with minimal human oversight. Courtesy of U.S. Army. To solve this, the researchers collaborated with Aranya Chakrabortty from North Carolina State University and He Bai of Oklahoma State University. The goal of the collaboration was to develop a theoretical foundation for data-driven control for large-scale swarm networks, where control actions are taken based on low-dimensional measurement data instead of dynamic models. The result is an approach called hierarchical reinforcement learning (HRL), and it decomposes the global control objective into multiple hierarchies — namely, multiple small group-level microscopic control, and a broad swarm-level macroscopic control. “Each hierarchy has its own learning loop with respective local and global reward functions,” George said. “We were able to significantly reduce the learning time by running these learning loops in parallel.” According to George, online reinforcement learning control of swarm boils down to solving a large-scale algebraic matrix Riccati equation using system, or swarm, input-output data. The initial approach to solving the large-scale matrix Riccati equation was to divide the swarm into multiple smaller groups and implement group-level reinforcement learning in parallel while executing a global reinforcement learning on a smaller-dimensional compressed state from each group. The current HRL scheme uses a decupling mechanism that allows the team to hierarchically approximate a solution to the large-scale matrix equation by first solving the local reinforcement learning problem and then synthesizing the global control from local controllers (by solving a least squares problem) instead of running a global reinforcement learning on the aggregated state. This further reduces learning time. Army researchers envision a hierarchical control for ground vehicle and air vehicle coordination. Courtesy of U.S. Army. Experiments have shown that compared to a centralized approach, HRL was able to reduce the learning time by 80% while limiting the optimality loss to 5%. “Our current HRL efforts will allow us to develop control policies for swarms of unmanned aerial and ground vehicles so that they can optimally accomplish different mission sets even though the individual dynamics for the swarming agents are unknown,” George said. The team is working to further develop its HRL control scheme by considering optimal grouping of agents in the swarm to minimize computation and communication complexity while limiting the optimality gap. The researchers are also investigating the use of deep recurrent neural networks to learn and predict the best grouping patterns and the application of developed techniques for optimal coordination of autonomous air and ground vehicles in multi-domain operations in dense urban terrain.