Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation

Abstract

This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.

Cite

Text

Yamabe et al. "Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation." International Conference on Learning Representations, 2026.

Markdown

[Yamabe et al. "Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yamabe2026iclr-robust/)

BibTeX

@inproceedings{yamabe2026iclr-robust,
  title     = {{Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation}},
  author    = {Yamabe, Shojiro and Fukuchi, Kazuto and Sakuma, Jun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/yamabe2026iclr-robust/}
}