Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

Abstract

An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires \emph{too much} feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an \emph{active-learning}-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $\tilde{O}(H{\dim_{R}^2})$ queries on the reward function to provide an $\epsilon$-optimal policy for any $\epsilon > 0$. Here $H$ is the horizon of the RL environment, and $\dim_{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Kong and Yang. "Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning." Neural Information Processing Systems, 2022.

Markdown

[Kong and Yang. "Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/kong2022neurips-provably/)

BibTeX

@inproceedings{kong2022neurips-provably,
  title     = {{Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning}},
  author    = {Kong, Dingwen and Yang, Lin},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/kong2022neurips-provably/}
}