Improved Regret Bounds for Bandits with Expert Advice

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order [KT ln(N/K)]1/2 for the worst-case regret, where K is the number of actions, N > K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of [KT ln(N)/ln(K)]1/2.  For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Cite

Text

Cesa-Bianchi et al. "Improved Regret Bounds for Bandits with Expert Advice." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.16738

Markdown

[Cesa-Bianchi et al. "Improved Regret Bounds for Bandits with Expert Advice." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/cesabianchi2025jair-improved/) doi:10.1613/JAIR.1.16738

BibTeX

@article{cesabianchi2025jair-improved,
  title     = {{Improved Regret Bounds for Bandits with Expert Advice}},
  author    = {Cesa-Bianchi, Nicolò and Eldowa, Khaled and Esposito, Emmanuel and Olkhovskaya, Julia},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2025},
  doi       = {10.1613/JAIR.1.16738},
  volume    = {83},
  url       = {https://mlanthology.org/jair/2025/cesabianchi2025jair-improved/}
}