Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Abstract
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find \textit{realizable} representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called \textit{HLS}) may be more effective for the exploration-exploitation task, enabling \textit{LinUCB} to achieve constant (i.e., horizon-independent) regret. In this paper, we propose \textsc{BanditSRL}, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that \textsc{BanditSRL} can be paired with any no-regret algorithm and achieve constant regret whenever an \textit{HLS} representation is available. Furthermore, \textsc{BanditSRL} can be easily combined with deep neural networks and we show how regularizing towards \textit{HLS} representations is beneficial in standard benchmarks.
Cite
Text
Tirinzoni et al. "Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees." Neural Information Processing Systems, 2022.Markdown
[Tirinzoni et al. "Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/tirinzoni2022neurips-scalable/)BibTeX
@inproceedings{tirinzoni2022neurips-scalable,
title = {{Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees}},
author = {Tirinzoni, Andrea and Papini, Matteo and Touati, Ahmed and Lazaric, Alessandro and Pirotta, Matteo},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/tirinzoni2022neurips-scalable/}
}