Experimental Design for Semiparametric Bandits

Abstract

We study finite-armed semiparametric bandits, where each arm’s reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{\mathcal{O}}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.

Cite

Text

Kim et al. "Experimental Design for Semiparametric Bandits." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.

Markdown

[Kim et al. "Experimental Design for Semiparametric Bandits." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/kim2025colt-experimental/)

BibTeX

@inproceedings{kim2025colt-experimental,
  title     = {{Experimental Design for Semiparametric Bandits}},
  author    = {Kim, Seok-Jin and Kim, Gi-Soo and Oh, Min-hwan},
  booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
  year      = {2025},
  pages     = {3215-3252},
  volume    = {291},
  url       = {https://mlanthology.org/colt/2025/kim2025colt-experimental/}
}