Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

Tsuchiya, Taira; Ito, Shinji; Honda, Junya

Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

ICML 2024 pp. 48768-48790

/icml/2024/tsuchiya2024icml-exploration/

Abstract

Partial monitoring is a generic framework of online decision-making problems with limited feedback. To make decisions from such limited feedback, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, exploration by optimization (ExO), was proposed, which achieves optimal bounds in adversarial environments with follow-the-regularized-leader for a wide range of online decision-making problems. However, a naive application of ExO in stochastic environments significantly degrades regret bounds. To resolve this issue in locally observable games, we first establish a new framework and analysis for ExO with a hybrid regularizer. This development allows us to significantly improve existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments. In particular, we derive a stochastic regret bound of $O(\sum_{a \neq a^*} k^2 m^2 \log T / \Delta_a)$, where $k$, $m$, and $T$ are the numbers of actions, observations and rounds, $a^*$ is an optimal action, and $\Delta_a$ is the suboptimality gap for action $a$. This bound is roughly $\Theta(k^2 \log T)$ times smaller than existing BOBW bounds. In addition, for globally observable games, we provide a new BOBW algorithm with the first $O(\log T)$ stochastic bound.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Tsuchiya et al. "Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring." International Conference on Machine Learning, 2024.

Markdown

[Tsuchiya et al. "Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/tsuchiya2024icml-exploration/)

BibTeX

@inproceedings{tsuchiya2024icml-exploration,
  title     = {{Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring}},
  author    = {Tsuchiya, Taira and Ito, Shinji and Honda, Junya},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {48768-48790},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/tsuchiya2024icml-exploration/}
}