Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm
Abstract
We introduce a novel approach to preference-based reinforcement learning, namely a preference-based variant of a direct policy search method based on evolutionary optimization. The core of our approach is a preference-based racing algorithm that selects the best among a given set of candidate policies with high probability. To this end, the algorithm operates on a suitable ordinal preference structure and only uses pairwise comparisons between sample rollouts of the policies. Embedding the racing algorithm in a rank-based evolutionary search procedure, we show that approximations of the so-called Smith set of optimal policies can be produced with certain theoretical guarantees. Apart from a formal performance and complexity analysis, we present first experimental studies showing that our approach performs well in practice.
Cite
Text
Busa-Fekete et al. "Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm." Machine Learning, 2014. doi:10.1007/S10994-014-5458-8Markdown
[Busa-Fekete et al. "Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm." Machine Learning, 2014.](https://mlanthology.org/mlj/2014/busafekete2014mlj-preferencebased/) doi:10.1007/S10994-014-5458-8BibTeX
@article{busafekete2014mlj-preferencebased,
title = {{Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm}},
author = {Busa-Fekete, Róbert and Szörényi, Balázs and Weng, Paul and Cheng, Weiwei and Hüllermeier, Eyke},
journal = {Machine Learning},
year = {2014},
pages = {327-351},
doi = {10.1007/S10994-014-5458-8},
volume = {97},
url = {https://mlanthology.org/mlj/2014/busafekete2014mlj-preferencebased/}
}