APRIL: Active Preference Learning-Based Reinforcement Learning
Abstract
This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.
Cite
Text
Akrour et al. "APRIL: Active Preference Learning-Based Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33486-3_8Markdown
[Akrour et al. "APRIL: Active Preference Learning-Based Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/akrour2012ecmlpkdd-april/) doi:10.1007/978-3-642-33486-3_8BibTeX
@inproceedings{akrour2012ecmlpkdd-april,
title = {{APRIL: Active Preference Learning-Based Reinforcement Learning}},
author = {Akrour, Riad and Schoenauer, Marc and Sebag, Michèle},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2012},
pages = {116-131},
doi = {10.1007/978-3-642-33486-3_8},
url = {https://mlanthology.org/ecmlpkdd/2012/akrour2012ecmlpkdd-april/}
}