Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
Abstract
An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires \emph{too much} feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an \emph{active-learning}-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $\tilde{O}(H{\dim_{R}^2})$ queries on the reward function to provide an $\epsilon$-optimal policy for any $\epsilon > 0$. Here $H$ is the horizon of the RL environment, and $\dim_{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition.
Cite
Text
Kong and Yang. "Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning." Neural Information Processing Systems, 2022.Markdown
[Kong and Yang. "Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/kong2022neurips-provably/)BibTeX
@inproceedings{kong2022neurips-provably,
title = {{Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning}},
author = {Kong, Dingwen and Yang, Lin},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/kong2022neurips-provably/}
}