Lipschitz Bandits in Optimal Space

Abstract

This paper considers the Lipschitz bandit problem, where the set of arms is continuous and the expected reward is a Lipschitz function over the arm space. This problem has been extensively studied. Prior algorithms need to store the reward information of all visited arms, leading to significant memory consumption. We address this issue by introducing an algorithm named Log-space Lipschitz bandits (Log-Li), which achieves an optimal (up to logarithmic factors) regret of $\widetilde{O}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ while only uses $O\left(\log T\right)$ bits of memory. Additionally, we provide a complexity analysis for this problem, demonstrating that $\Omega\left(\log T\right)$ bits of space are necessary for any algorithm to achieve the optimal regret. We also conduct numerical simulations, and the results show that our new algorithm achieves regret comparable to the state-of-the-art while reducing memory usage by orders of magnitude.

Cite

Text

Zhu and Huang. "Lipschitz Bandits in Optimal Space." International Conference on Learning Representations, 2025.

Markdown

[Zhu and Huang. "Lipschitz Bandits in Optimal Space." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/zhu2025iclr-lipschitz/)

BibTeX

@inproceedings{zhu2025iclr-lipschitz,
  title     = {{Lipschitz Bandits in Optimal Space}},
  author    = {Zhu, Xiaoyi and Huang, Zengfeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/zhu2025iclr-lipschitz/}
}