Greedy When Sure and Conservative When Uncertain About the Opponents

Abstract

We develop a new approach, named Greedy when Sure and Conservative when Uncertain (GSCU), to competing online against unknown and nonstationary opponents. GSCU improves in four aspects: 1) introduces a novel way of learning opponent policy embeddings offline; 2) trains offline a single best response (conditional additionally on our opponent policy embedding) instead of a finite set of separate best responses against any opponent; 3) computes online a posterior of the current opponent policy embedding, without making the discrete and ineffective decision which type the current opponent belongs to; and 4) selects online between a real-time greedy policy and a fixed conservative policy via an adversarial bandit algorithm, gaining a theoretically better regret than adhering to either. Experimental studies on popular benchmarks demonstrate GSCU’s superiority over the state-of-the-art methods. The code is available online at \url{https://github.com/YeTianJHU/GSCU}.

Cite

Text

Fu et al. "Greedy When Sure and Conservative When Uncertain About the Opponents." International Conference on Machine Learning, 2022.

Markdown

[Fu et al. "Greedy When Sure and Conservative When Uncertain About the Opponents." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/fu2022icml-greedy/)

BibTeX

@inproceedings{fu2022icml-greedy,
  title     = {{Greedy When Sure and Conservative When Uncertain About the Opponents}},
  author    = {Fu, Haobo and Tian, Ye and Yu, Hongxiang and Liu, Weiming and Wu, Shuang and Xiong, Jiechao and Wen, Ying and Li, Kai and Xing, Junliang and Fu, Qiang and Yang, Wei},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {6829-6848},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/fu2022icml-greedy/}
}