Towards a Pretrained Model for Restless Bandits via Multi-Arm Generalization

Zhao, Yunfan; Behari, Nikhil; Hughes, Edward; Zhang, Edwin; Nagaraj, Dheeraj; Tuyls, Karl; Taneja, Aparna; Tambe, Milind

doi:10.24963/ijcai.2024/36

Towards a Pretrained Model for Restless Bandits via Multi-Arm Generalization

Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

IJCAI 2024 pp. 321-329

doi:10.24963/ijcai.2024/36 /ijcai/2024/zhao2024ijcai-pretrained/

Abstract

Ensuring the safety of high-speed agent in dynamic adversarial environments, such as pursuit-evasion games with target-purchase and obstacle-avoidance, is a significant challenge. Existing reinforcement learning methods often fail to balance safety and reward under strict safety constraints and diverse environmental conditions. To address these limitations, this paper proposes a novel zero-constraint-violation recovery RL framework tailored for high-speed uav pursuit-evasion combat games. The framework includes three key innovations. (1) An extendable multi-step reach-avoid theory: we provide a zero-constraint-violation safety guarantee for multi-strategy reinforcement learning and enabling early danger detection in high speed game. (2) A masked-attention recovery strategy: we introduce a padding-mask attention architecture to handle spatiotemporal variations in dynamic obstacles with varying threat levels. (3) Experimental validation: we validate the framework in obstacle-rich pursuit-evasion scenarios, demonstrating its superiority through comparison with other algorithm and ablation studies. Our approach also shows potential for extension to other rapid-motion tasks and more complex hazardous scenarios. Details and code could be found at https://msmar-rl.github.io.

PDF IJCAI Semantic Scholar

Cite

Text

Zhao et al. "Towards a Pretrained Model for Restless Bandits via Multi-Arm Generalization." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/36

Markdown

[Zhao et al. "Towards a Pretrained Model for Restless Bandits via Multi-Arm Generalization." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/zhao2024ijcai-pretrained/) doi:10.24963/ijcai.2024/36

BibTeX

@inproceedings{zhao2024ijcai-pretrained,
  title     = {{Towards a Pretrained Model for Restless Bandits via Multi-Arm Generalization}},
  author    = {Zhao, Yunfan and Behari, Nikhil and Hughes, Edward and Zhang, Edwin and Nagaraj, Dheeraj and Tuyls, Karl and Taneja, Aparna and Tambe, Milind},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {321-329},
  doi       = {10.24963/ijcai.2024/36},
  url       = {https://mlanthology.org/ijcai/2024/zhao2024ijcai-pretrained/}
}