Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that both of our methods are the first algorithm to achieve the optimal $\tilde{O}((d_1+d_2)r\sqrt{T})$ bound of regret presented in \cite{lu2021low} up to logarithm terms under some mild conditions, which improves upon the current regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT})$~\citep{lu2021low}. For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

Cite

Text

Kang et al. "Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems." Neural Information Processing Systems, 2022.

Markdown

[Kang et al. "Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/kang2022neurips-efficient/)

BibTeX

@inproceedings{kang2022neurips-efficient,
  title     = {{Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems}},
  author    = {Kang, Yue and Hsieh, Cho-Jui and Lee, Thomas Chun Man},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/kang2022neurips-efficient/}
}