Logarithmic Regret from Sublinear Hints

Abstract

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t \rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a _hint_ $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at _every_ time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of {\em optimistic} regret bounds, and to the problem of online learning with abstention.

Cite

Text

Bhaskara et al. "Logarithmic Regret from Sublinear Hints." Neural Information Processing Systems, 2021.

Markdown

[Bhaskara et al. "Logarithmic Regret from Sublinear Hints." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/bhaskara2021neurips-logarithmic/)

BibTeX

@inproceedings{bhaskara2021neurips-logarithmic,
  title     = {{Logarithmic Regret from Sublinear Hints}},
  author    = {Bhaskara, Aditya and Cutkosky, Ashok and Kumar, Ravi and Purohit, Manish},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/bhaskara2021neurips-logarithmic/}
}