DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation

Abstract

Offline Learning from Demonstrations (OLfD) is valuable in domains where trial-and-error learning is infeasible or specifying a cost function is difficult, such as robotic surgery, autonomous driving, and path-finding for NASA’s Mars rovers. However, two key problems remain challenging in OLfD: 1) heterogeneity: demonstration data can be generated with diverse preferences and strategies, and 2) generalizability: the learned policy and reward must perform well beyond a limited training regime in unseen test settings. To overcome these challenges, we propose Dual Reward and policy Offline Inverse Distillation (DROID), where the key idea is to leverage diversity to improve generalization performance by decomposing common-task and individual-specific strategies and distilling knowledge in both the reward and policy spaces. We ground DROID in a novel and uniquely challenging Mars rover path-planning problem for NASA’s Mars Curiosity Rover. We also curate a novel dataset along 163 Sols (Martian days) and conduct a novel, empirical investigation to characterize heterogeneity in the dataset. We find DROID outperforms prior SOTA OLfD techniques, leading to a $26%$ improvement in modeling expert behaviors and $92%$ closer to the task objective of reaching the final destination. We also benchmark DROID on the OpenAI Gym Cartpole environment and find DROID achieves $55%$ (significantly) better performance modeling heterogeneous demonstrations.

Cite

Text

Jayanthi et al. "DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation." Conference on Robot Learning, 2023.

Markdown

[Jayanthi et al. "DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/jayanthi2023corl-droid/)

BibTeX

@inproceedings{jayanthi2023corl-droid,
  title     = {{DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation}},
  author    = {Jayanthi, Sravan and Chen, Letian and Balabanska, Nadya and Duong, Van and Scarlatescu, Erik and Ameperosa, Ezra and Zaidi, Zulfiqar Haider and Martin, Daniel and Del Matto, Taylor Keith and Ono, Masahiro and Gombolay, Matthew},
  booktitle = {Conference on Robot Learning},
  year      = {2023},
  pages     = {1547-1571},
  volume    = {229},
  url       = {https://mlanthology.org/corl/2023/jayanthi2023corl-droid/}
}