Policy Invariance Under Reward Transformations for General-Sum Stochastic Games
Abstract
We extend the potential-based shapingmethod from Markov decision processes to multiplayer general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.
Cite
Text
Lu et al. "Policy Invariance Under Reward Transformations for General-Sum Stochastic Games." Journal of Artificial Intelligence Research, 2011. doi:10.1613/JAIR.3384Markdown
[Lu et al. "Policy Invariance Under Reward Transformations for General-Sum Stochastic Games." Journal of Artificial Intelligence Research, 2011.](https://mlanthology.org/jair/2011/lu2011jair-policy/) doi:10.1613/JAIR.3384BibTeX
@article{lu2011jair-policy,
title = {{Policy Invariance Under Reward Transformations for General-Sum Stochastic Games}},
author = {Lu, Xiaosong and Schwartz, Howard M. and Givigi, Sidney Nascimento},
journal = {Journal of Artificial Intelligence Research},
year = {2011},
pages = {397-406},
doi = {10.1613/JAIR.3384},
volume = {41},
url = {https://mlanthology.org/jair/2011/lu2011jair-policy/}
}