Approximate Policy Iteration Using Large-Margin Classifiers

Abstract

We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains.

Cite

Text

Lagoudakis and Parr. "Approximate Policy Iteration Using Large-Margin Classifiers." International Joint Conference on Artificial Intelligence, 2003.

Markdown

[Lagoudakis and Parr. "Approximate Policy Iteration Using Large-Margin Classifiers." International Joint Conference on Artificial Intelligence, 2003.](https://mlanthology.org/ijcai/2003/lagoudakis2003ijcai-approximate/)

BibTeX

@inproceedings{lagoudakis2003ijcai-approximate,
  title     = {{Approximate Policy Iteration Using Large-Margin Classifiers}},
  author    = {Lagoudakis, Michail G. and Parr, Ronald},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2003},
  pages     = {1432-1434},
  url       = {https://mlanthology.org/ijcai/2003/lagoudakis2003ijcai-approximate/}
}