Approximate Policy Iteration with Linear Action Models

Abstract

In this paper we consider the problem of finding a good policy given some batch data.We propose a new approach, LAM-API, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy.A natural choice for the policy evaluation step in this algorithm is to use least-squares temporal difference (LSTD) learning algorithm.Empirical results on three benchmark problems show that this particular instance of LAM-API performs competitively as compared with LSPI, both from the point of view of data and computational efficiency.

Cite

Text

Yao and Szepesvári. "Approximate Policy Iteration with Linear Action Models." AAAI Conference on Artificial Intelligence, 2012. doi:10.1609/AAAI.V26I1.8319

Markdown

[Yao and Szepesvári. "Approximate Policy Iteration with Linear Action Models." AAAI Conference on Artificial Intelligence, 2012.](https://mlanthology.org/aaai/2012/yao2012aaai-approximate/) doi:10.1609/AAAI.V26I1.8319

BibTeX

@inproceedings{yao2012aaai-approximate,
  title     = {{Approximate Policy Iteration with Linear Action Models}},
  author    = {Yao, Hengshuai and Szepesvári, Csaba},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2012},
  pages     = {1212-1218},
  doi       = {10.1609/AAAI.V26I1.8319},
  url       = {https://mlanthology.org/aaai/2012/yao2012aaai-approximate/}
}