Reinforcement Learning Under Threats

Abstract

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Cite

Text

Gallego et al. "Reinforcement Learning Under Threats." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33019939

Markdown

[Gallego et al. "Reinforcement Learning Under Threats." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/gallego2019aaai-reinforcement/) doi:10.1609/AAAI.V33I01.33019939

BibTeX

@inproceedings{gallego2019aaai-reinforcement,
  title     = {{Reinforcement Learning Under Threats}},
  author    = {Gallego, Víctor and Naveiro, Roi and Insua, David Ríos},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {9939-9940},
  doi       = {10.1609/AAAI.V33I01.33019939},
  url       = {https://mlanthology.org/aaai/2019/gallego2019aaai-reinforcement/}
}