U.S. Army advances novel machine learning approach to coordinate drone swarms
Hierarchical Reinforcement Learning technique cuts computation and communications complexity to reduce learning time by 80%
According to Dr. Jemin George of the U.S. Army Combat Capabilities Development Command’s Army Research Laboratory, the Army is looking to apply swarming technology to time-consuming or dangerous tasks.
“Finding optimal guidance policies for these swarming vehicles in real-time is a key requirement for enhancing warfighters’ tactical situational awareness, allowing the U.S. Army to dominate in a contested environment,” George said.
Their approach focuses on reinforcement learning, a type of machine learning that deals with goal-oriented algorithms that learn how to achieve a complex or long-term goal through several steps. Current schemes require data to be pooled in a centralzed “learner”, an approach that’s heavily dependent on complex computation and communication systems and thereby requires a long learning time.To solve this issue, the Army researchers collaborated with Prof. Aranya Chakrabortty from North Carolina State University and Prof. He Bai from Oklahoma State University to development the current approach called Hierarchical Reinforcement Learning, or HRL. In essence, HRL breaks down a large overall objective into hierarchies – multiple small group-level microscopic controls and a broad swarm-level macroscopic control.
“Each hierarchy has its own learning loop with respective local and global reward functions,” George said. “We were able to significantly reduce the learning time by running these learning loops in parallel.”
According the research team, experiments have shown that HRL reduced learning time by 80% while limiting the optimality loss to 5% when compared to a centralized approach,
“Our current HRL efforts will allow us to develop control policies for swarms of unmanned aerial and ground vehicles so that they can optimally accomplish different mission sets even though the individual dynamics for the swarming agents are unknown,” George said.
Currently, the team is working to improve their HRL control scheme by considering optimal grouping of agents in the swarm to minimize computation and communication complexity while limiting the optimality gap.
They are also investigating the use of deep recurrent neural networks to learn and predict the best grouping patterns and the application of developed techniques for optimal coordination of autonomous air and ground vehicles in Multi-Domain Operations in dense urban terrain.