Reinforcement Learning Under Threats
Abstract
In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.
Cite
Text
Gallego et al. "Reinforcement Learning Under Threats." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33019939Markdown
[Gallego et al. "Reinforcement Learning Under Threats." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/gallego2019aaai-reinforcement/) doi:10.1609/AAAI.V33I01.33019939BibTeX
@inproceedings{gallego2019aaai-reinforcement,
title = {{Reinforcement Learning Under Threats}},
author = {Gallego, Víctor and Naveiro, Roi and Insua, David Ríos},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2019},
pages = {9939-9940},
doi = {10.1609/AAAI.V33I01.33019939},
url = {https://mlanthology.org/aaai/2019/gallego2019aaai-reinforcement/}
}