Learning Near-Pareto-Optimal Conventions in Polynomial Time

Abstract

We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different pref- erences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal al- gorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions).

Cite

Text

Wang and Sandholm. "Learning Near-Pareto-Optimal Conventions in Polynomial Time." Neural Information Processing Systems, 2003.

Markdown

[Wang and Sandholm. "Learning Near-Pareto-Optimal Conventions in Polynomial Time." Neural Information Processing Systems, 2003.](https://mlanthology.org/neurips/2003/wang2003neurips-learning/)

BibTeX

@inproceedings{wang2003neurips-learning,
  title     = {{Learning Near-Pareto-Optimal Conventions in Polynomial Time}},
  author    = {Wang, Xiaofeng and Sandholm, Tuomas},
  booktitle = {Neural Information Processing Systems},
  year      = {2003},
  pages     = {863-870},
  url       = {https://mlanthology.org/neurips/2003/wang2003neurips-learning/}
}