Personalizing Many Decisions with High-Dimensional Covariates

Abstract

We consider the k-armed stochastic contextual bandit problem with d dimensional features, when both k and d can be large. To the best of our knowledge, all existing algorithm for this problem have a regret bound that scale as polynomials of degree at least two in k and d. The main contribution of this paper is to introduce and theoretically analyze a new algorithm (REAL Bandit) with a regret that scales by r^2(k+d) when r is rank of the k by d matrix of unknown parameters. REAL Bandit relies on ideas from low-rank matrix estimation literature and a new row-enhancement subroutine that yields sharper bounds for estimating each row of the parameter matrix that may be of independent interest.

Cite

Text

Hamidi et al. "Personalizing Many Decisions with High-Dimensional Covariates." Neural Information Processing Systems, 2019.

Markdown

[Hamidi et al. "Personalizing Many Decisions with High-Dimensional Covariates." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/hamidi2019neurips-personalizing/)

BibTeX

@inproceedings{hamidi2019neurips-personalizing,
  title     = {{Personalizing Many Decisions with High-Dimensional Covariates}},
  author    = {Hamidi, Nima and Bayati, Mohsen and Gupta, Kapil},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {11473-11484},
  url       = {https://mlanthology.org/neurips/2019/hamidi2019neurips-personalizing/}
}