Policy Invariance Under Reward Transformations for General-Sum Stochastic Games

Abstract

We extend the potential-based shapingmethod from Markov decision processes to multiplayer general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.

Cite

Text

Lu et al. "Policy Invariance Under Reward Transformations for General-Sum Stochastic Games." Journal of Artificial Intelligence Research, 2011. doi:10.1613/JAIR.3384

Markdown

[Lu et al. "Policy Invariance Under Reward Transformations for General-Sum Stochastic Games." Journal of Artificial Intelligence Research, 2011.](https://mlanthology.org/jair/2011/lu2011jair-policy/) doi:10.1613/JAIR.3384

BibTeX

@article{lu2011jair-policy,
  title     = {{Policy Invariance Under Reward Transformations for General-Sum Stochastic Games}},
  author    = {Lu, Xiaosong and Schwartz, Howard M. and Givigi, Sidney Nascimento},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2011},
  pages     = {397-406},
  doi       = {10.1613/JAIR.3384},
  volume    = {41},
  url       = {https://mlanthology.org/jair/2011/lu2011jair-policy/}
}