Logarithmic Regret from Sublinear Hints

Abstract

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t \rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a _hint_ $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at _every_ time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of {\em optimistic} regret bounds, and to the problem of online learning with abstention.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Bhaskara et al. "Logarithmic Regret from Sublinear Hints." Neural Information Processing Systems, 2021.

Markdown

[Bhaskara et al. "Logarithmic Regret from Sublinear Hints." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/bhaskara2021neurips-logarithmic/)

BibTeX

@inproceedings{bhaskara2021neurips-logarithmic,
  title     = {{Logarithmic Regret from Sublinear Hints}},
  author    = {Bhaskara, Aditya and Cutkosky, Ashok and Kumar, Ravi and Purohit, Manish},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/bhaskara2021neurips-logarithmic/}
}