No-Regret Linear Bandits Beyond Realizability

Abstract

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $\epsilon > 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm — designed for the realizable case — is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal $\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon $T$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

Cite

Text

Liu et al. "No-Regret Linear Bandits Beyond Realizability." Uncertainty in Artificial Intelligence, 2023.

Markdown

[Liu et al. "No-Regret Linear Bandits Beyond Realizability." Uncertainty in Artificial Intelligence, 2023.](https://mlanthology.org/uai/2023/liu2023uai-noregret/)

BibTeX

@inproceedings{liu2023uai-noregret,
  title     = {{No-Regret Linear Bandits Beyond Realizability}},
  author    = {Liu, Chong and Yin, Ming and Wang, Yu-Xiang},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2023},
  pages     = {1294-1303},
  volume    = {216},
  url       = {https://mlanthology.org/uai/2023/liu2023uai-noregret/}
}