Scalable Bilinear Pi Learning Using State and Action Features

Abstract

Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Cite

Text

Chen et al. "Scalable Bilinear Pi Learning Using State and Action Features." International Conference on Machine Learning, 2018.

Markdown

[Chen et al. "Scalable Bilinear Pi Learning Using State and Action Features." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/chen2018icml-scalable/)

BibTeX

@inproceedings{chen2018icml-scalable,
  title     = {{Scalable Bilinear Pi Learning Using State and Action Features}},
  author    = {Chen, Yichen and Li, Lihong and Wang, Mengdi},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {834-843},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/chen2018icml-scalable/}
}