Exploration via Feature Perturbation in Contextual Bandits

Abstract

We propose *feature perturbation*, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\widetilde{\mathcal{O}}(d\sqrt{T})$ worst-case regret bound for generalized linear contextual bandits, while avoiding the $\widetilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.

Cite

Text

Yi and Oh. "Exploration via Feature Perturbation in Contextual Bandits." Advances in Neural Information Processing Systems, 2025.

Markdown

[Yi and Oh. "Exploration via Feature Perturbation in Contextual Bandits." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yi2025neurips-exploration/)

BibTeX

@inproceedings{yi2025neurips-exploration,
  title     = {{Exploration via Feature Perturbation in Contextual Bandits}},
  author    = {Yi, Seouh-won and Oh, Min-hwan},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/yi2025neurips-exploration/}
}