Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm

Busa-Fekete, Róbert; Szörényi, Balázs; Weng, Paul; Cheng, Weiwei; Hüllermeier, Eyke

doi:10.1007/S10994-014-5458-8

Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm

Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Weiwei Cheng, Eyke Hüllermeier

MLJ 2014 pp. 327-351

doi:10.1007/S10994-014-5458-8 /mlj/2014/busafekete2014mlj-preferencebased/

Abstract

We introduce a novel approach to preference-based reinforcement learning, namely a preference-based variant of a direct policy search method based on evolutionary optimization. The core of our approach is a preference-based racing algorithm that selects the best among a given set of candidate policies with high probability. To this end, the algorithm operates on a suitable ordinal preference structure and only uses pairwise comparisons between sample rollouts of the policies. Embedding the racing algorithm in a rank-based evolutionary search procedure, we show that approximations of the so-called Smith set of optimal policies can be produced with certain theoretical guarantees. Apart from a formal performance and complexity analysis, we present first experimental studies showing that our approach performs well in practice.

PDF MLJ Semantic Scholar

Cite

Text

Busa-Fekete et al. "Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm." Machine Learning, 2014. doi:10.1007/S10994-014-5458-8

Markdown

[Busa-Fekete et al. "Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm." Machine Learning, 2014.](https://mlanthology.org/mlj/2014/busafekete2014mlj-preferencebased/) doi:10.1007/S10994-014-5458-8

BibTeX

@article{busafekete2014mlj-preferencebased,
  title     = {{Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm}},
  author    = {Busa-Fekete, Róbert and Szörényi, Balázs and Weng, Paul and Cheng, Weiwei and Hüllermeier, Eyke},
  journal   = {Machine Learning},
  year      = {2014},
  pages     = {327-351},
  doi       = {10.1007/S10994-014-5458-8},
  volume    = {97},
  url       = {https://mlanthology.org/mlj/2014/busafekete2014mlj-preferencebased/}
}