Model-Free Low-Rank Reinforcement Learning via Leveraged Entry-Wise Matrix Estimation

Abstract

We consider the problem of learning an $\varepsilon$-optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure. The entries of the matrix are first sampled uniformly at random to estimate, via a spectral method, the *leverage scores* of its rows and columns. These scores are then used to extract a few important rows and columns whose entries are further sampled. The algorithm exploits these new samples to complete the matrix estimation using a CUR-like method. For this leveraged matrix estimation procedure, we establish entry-wise guarantees that remarkably, do not depend on the coherence of the matrix but only on its spikiness. These guarantees imply that LoRa-PI learns an $\varepsilon$-optimal policy using $\tilde{\cal O}({(S+A)\over \mathrm{poly}(1-\gamma)\varepsilon^2})$ samples where $S$ (resp. $A$) denotes the number of states (resp. actions) and $\gamma$ the discount factor. Our algorithm achieves this order-optimal (in $S$, $A$ and $\varepsilon$) sample complexity under milder conditions than those assumed in previously proposed approaches.

Cite

Text

Stojanovic et al. "Model-Free Low-Rank Reinforcement Learning via Leveraged Entry-Wise Matrix Estimation." Neural Information Processing Systems, 2024. doi:10.52202/079017-0972

Markdown

[Stojanovic et al. "Model-Free Low-Rank Reinforcement Learning via Leveraged Entry-Wise Matrix Estimation." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/stojanovic2024neurips-modelfree/) doi:10.52202/079017-0972

BibTeX

@inproceedings{stojanovic2024neurips-modelfree,
  title     = {{Model-Free Low-Rank Reinforcement Learning via Leveraged Entry-Wise Matrix Estimation}},
  author    = {Stojanovic, Stefan and Jedra, Yassir and Proutiere, Alexandre},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0972},
  url       = {https://mlanthology.org/neurips/2024/stojanovic2024neurips-modelfree/}
}