Batched Nonparametric Bandits via K-Nearest Neighbor UCB

Abstract

We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing, where online feedback is limited, we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context density, and is simple to implement. Unlike prior works relying on parametric or binning-based estimators, BaNk-UCB uses local geometry of the contexts to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves minimax-optimal rates. Empirical evaluations on synthetic and real-world datasets demonstrate that BaNk-UCB consistently outperforms binning-based baselines.

Cite

Text

Arya. "Batched Nonparametric Bandits via K-Nearest Neighbor UCB." Transactions on Machine Learning Research, 2025.

Markdown

[Arya. "Batched Nonparametric Bandits via K-Nearest Neighbor UCB." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/arya2025tmlr-batched/)

BibTeX

@article{arya2025tmlr-batched,
  title     = {{Batched Nonparametric Bandits via K-Nearest Neighbor UCB}},
  author    = {Arya, Sakshi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/arya2025tmlr-batched/}
}