Exploration by Optimisation in Partial Monitoring
Abstract
We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 6m k^(3/2) sqrt(n log(k)), where m is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.
Cite
Text
Lattimore and Szepesvári. "Exploration by Optimisation in Partial Monitoring." Conference on Learning Theory, 2020.Markdown
[Lattimore and Szepesvári. "Exploration by Optimisation in Partial Monitoring." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/lattimore2020colt-exploration/)BibTeX
@inproceedings{lattimore2020colt-exploration,
title = {{Exploration by Optimisation in Partial Monitoring}},
author = {Lattimore, Tor and Szepesvári, Csaba},
booktitle = {Conference on Learning Theory},
year = {2020},
pages = {2488-2515},
volume = {125},
url = {https://mlanthology.org/colt/2020/lattimore2020colt-exploration/}
}