Learning Near-Pareto-Optimal Conventions in Polynomial Time
Abstract
We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different pref- erences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal al- gorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions).
Cite
Text
Wang and Sandholm. "Learning Near-Pareto-Optimal Conventions in Polynomial Time." Neural Information Processing Systems, 2003.Markdown
[Wang and Sandholm. "Learning Near-Pareto-Optimal Conventions in Polynomial Time." Neural Information Processing Systems, 2003.](https://mlanthology.org/neurips/2003/wang2003neurips-learning/)BibTeX
@inproceedings{wang2003neurips-learning,
title = {{Learning Near-Pareto-Optimal Conventions in Polynomial Time}},
author = {Wang, Xiaofeng and Sandholm, Tuomas},
booktitle = {Neural Information Processing Systems},
year = {2003},
pages = {863-870},
url = {https://mlanthology.org/neurips/2003/wang2003neurips-learning/}
}