Deterministic and Discriminative Imitation (d2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency
Abstract
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which oper- ates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.
Cite
Text
Sun et al. "Deterministic and Discriminative Imitation (d2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I8.20813Markdown
[Sun et al. "Deterministic and Discriminative Imitation (d2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/sun2022aaai-deterministic/) doi:10.1609/AAAI.V36I8.20813BibTeX
@inproceedings{sun2022aaai-deterministic,
title = {{Deterministic and Discriminative Imitation (d2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency}},
author = {Sun, Mingfei and Devlin, Sam and Hofmann, Katja and Whiteson, Shimon},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2022},
pages = {8378-8385},
doi = {10.1609/AAAI.V36I8.20813},
url = {https://mlanthology.org/aaai/2022/sun2022aaai-deterministic/}
}