Open Problem: First-Order Regret Bounds for Contextual Bandits
Abstract
We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of $\tilde{\mathcal{O}}(\sqrt{L_⋆}K \ln N)$ where there are $K$ actions, $N$ policies, and $L_⋆$ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret $\tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ)))$. We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.
Cite
Text
Agarwal et al. "Open Problem: First-Order Regret Bounds for Contextual Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.Markdown
[Agarwal et al. "Open Problem: First-Order Regret Bounds for Contextual Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.](https://mlanthology.org/colt/2017/agarwal2017colt-open/)BibTeX
@inproceedings{agarwal2017colt-open,
title = {{Open Problem: First-Order Regret Bounds for Contextual Bandits}},
author = {Agarwal, Alekh and Krishnamurthy, Akshay and Langford, John and Luo, Haipeng and Schapire, Robert E.},
booktitle = {Proceedings of the 2017 Conference on Learning Theory},
year = {2017},
pages = {4-7},
volume = {65},
url = {https://mlanthology.org/colt/2017/agarwal2017colt-open/}
}