Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Abstract

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. State-of-the-art methods have approached this problem by mapping human information to reward and value signals to indicate preferences and then iterating over them to compute the necessary control policy. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct labels on the policy. We compare Advise to state-of-the-art approaches and highlight scenarios where it outperforms them and importantly is robust to infrequent and inconsistent human feedback.

Cite

Text

Griffith et al. "Policy Shaping: Integrating Human Feedback with Reinforcement Learning." Neural Information Processing Systems, 2013.

Markdown

[Griffith et al. "Policy Shaping: Integrating Human Feedback with Reinforcement Learning." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/griffith2013neurips-policy/)

BibTeX

@inproceedings{griffith2013neurips-policy,
  title     = {{Policy Shaping: Integrating Human Feedback with Reinforcement Learning}},
  author    = {Griffith, Shane and Subramanian, Kaushik and Scholz, Jonathan and Isbell, Charles L and Thomaz, Andrea L},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {2625-2633},
  url       = {https://mlanthology.org/neurips/2013/griffith2013neurips-policy/}
}