Model and Reinforcement Learning for Markov Games with Risk Preferences
Abstract
We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.
Cite
Text
Huang et al. "Model and Reinforcement Learning for Markov Games with Risk Preferences." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I02.5574Markdown
[Huang et al. "Model and Reinforcement Learning for Markov Games with Risk Preferences." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/huang2020aaai-model/) doi:10.1609/AAAI.V34I02.5574BibTeX
@inproceedings{huang2020aaai-model,
title = {{Model and Reinforcement Learning for Markov Games with Risk Preferences}},
author = {Huang, Wenjie and Hai, Pham Viet and Haskell, William Benjamin},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2020},
pages = {2022-2029},
doi = {10.1609/AAAI.V34I02.5574},
url = {https://mlanthology.org/aaai/2020/huang2020aaai-model/}
}