Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Abstract

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, e.g. inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in most real-world scenarios. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings and following the Luce-Shepard rule. However, we show these approaches make incorrect assumptions and thus suffer from brittle, degraded performance. We overcome these limitations in developing a novel approach that bootstraps off suboptimal demonstrations to synthesize optimality-parameterized data to train an idealized reward function. We empirically validate we learn an idealized reward function with  0.95 correlation with ground-truth reward versus  0.75 for prior work. We can then train policies achieving  200% improvement over the suboptimal demonstration and  90% improvement over prior work. We present a physical demonstration of teaching a robot a topspin strike in table tennis that achieves 32% faster returns and 40% more topspin than user demonstration.

Cite

Text

Chen et al. "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression." Conference on Robot Learning, 2020.

Markdown

[Chen et al. "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression." Conference on Robot Learning, 2020.](https://mlanthology.org/corl/2020/chen2020corl-learning-b/)

BibTeX

@inproceedings{chen2020corl-learning-b,
  title     = {{Learning from Suboptimal Demonstration via Self-Supervised Reward Regression}},
  author    = {Chen, Letian and Paleja, Rohan and Gombolay, Matthew},
  booktitle = {Conference on Robot Learning},
  year      = {2020},
  pages     = {1262-1277},
  volume    = {155},
  url       = {https://mlanthology.org/corl/2020/chen2020corl-learning-b/}
}