SHINE: Shielding Backdoors in Deep Reinforcement Learning

Abstract

Recent studies have discovered that a deep reinforcement learning (DRL) policy is vulnerable to backdoor attacks. Existing defenses against backdoor attacks either do not consider RL’s unique mechanism or make unrealistic assumptions, resulting in limited defense efficacy, practicability, and generalizability. We propose SHINE, a backdoor shielding method specific for DRL. SHINE designs novel policy explanation techniques to identify the backdoor triggers and a policy retraining algorithm to eliminate the impact of the triggers on backdoored agents. We theoretically justify that SHINE guarantees to improve a backdoored agent’s performance in a poisoned environment while ensuring its performance difference in the clean environment before and after shielding is bounded. We further conduct extensive experiments that evaluate SHINE against three mainstream DRL backdoor attacks in various benchmark RL environments. Our results show that SHINE significantly outperforms existing defenses in mitigating these backdoor attacks.

Cite

Text

Yuan et al. "SHINE: Shielding Backdoors in Deep Reinforcement Learning." International Conference on Machine Learning, 2024.

Markdown

[Yuan et al. "SHINE: Shielding Backdoors in Deep Reinforcement Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/yuan2024icml-shine/)

BibTeX

@inproceedings{yuan2024icml-shine,
  title     = {{SHINE: Shielding Backdoors in Deep Reinforcement Learning}},
  author    = {Yuan, Zhuowen and Guo, Wenbo and Jia, Jinyuan and Li, Bo and Song, Dawn},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {57887-57904},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/yuan2024icml-shine/}
}