Approximate Policy Iteration Using Large-Margin Classifiers
Abstract
We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains.
Cite
Text
Lagoudakis and Parr. "Approximate Policy Iteration Using Large-Margin Classifiers." International Joint Conference on Artificial Intelligence, 2003.Markdown
[Lagoudakis and Parr. "Approximate Policy Iteration Using Large-Margin Classifiers." International Joint Conference on Artificial Intelligence, 2003.](https://mlanthology.org/ijcai/2003/lagoudakis2003ijcai-approximate/)BibTeX
@inproceedings{lagoudakis2003ijcai-approximate,
title = {{Approximate Policy Iteration Using Large-Margin Classifiers}},
author = {Lagoudakis, Michail G. and Parr, Ronald},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2003},
pages = {1432-1434},
url = {https://mlanthology.org/ijcai/2003/lagoudakis2003ijcai-approximate/}
}