SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Abstract

Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop "SleeperNets" as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Rathbun et al. "SleeperNets: Universal Backdoor Poisoning Attacks Against  Reinforcement Learning Agents." Neural Information Processing Systems, 2024. doi:10.52202/079017-3556

Markdown

[Rathbun et al. "SleeperNets: Universal Backdoor Poisoning Attacks Against  Reinforcement Learning Agents." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/rathbun2024neurips-sleepernets/) doi:10.52202/079017-3556

BibTeX

@inproceedings{rathbun2024neurips-sleepernets,
  title     = {{SleeperNets: Universal Backdoor Poisoning Attacks Against  Reinforcement Learning Agents}},
  author    = {Rathbun, Ethan and Amato, Christopher and Oprea, Alina},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3556},
  url       = {https://mlanthology.org/neurips/2024/rathbun2024neurips-sleepernets/}
}