Open Problem: First-Order Regret Bounds for Contextual Bandits

Abstract

We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of $\tilde{\mathcal{O}}(\sqrt{L_⋆}K \ln N)$ where there are $K$ actions, $N$ policies, and $L_⋆$ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret $\tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ)))$. We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.

Cite

Text

Agarwal et al. "Open Problem: First-Order Regret Bounds for Contextual Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.

Markdown

[Agarwal et al. "Open Problem: First-Order Regret Bounds for Contextual Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.](https://mlanthology.org/colt/2017/agarwal2017colt-open/)

BibTeX

@inproceedings{agarwal2017colt-open,
  title     = {{Open Problem: First-Order Regret Bounds for Contextual Bandits}},
  author    = {Agarwal, Alekh and Krishnamurthy, Akshay and Langford, John and Luo, Haipeng and Schapire, Robert E.},
  booktitle = {Proceedings of the 2017 Conference on Learning Theory},
  year      = {2017},
  pages     = {4-7},
  volume    = {65},
  url       = {https://mlanthology.org/colt/2017/agarwal2017colt-open/}
}