Learning a Domain-Agnostic Policy Through Adversarial Representation Matching for Cross-Domain Policy Transfer

Abstract

The low transferability of learned policies is one of the most critical problems limiting the applicability of learning-based solutions to decision-making tasks. In this paper, we present a way to align latent representations of states and actions between different domains by optimizing an adversarial objective. We train two models, a policy and a domain discriminator, with unpaired trajectories of proxy tasks through behavioral cloning as well as adversarial training. After the latent representations are aligned between domains, a domain-agnostic part of the policy trained with any method in the source domain can be immediately transferred to the target domain in a zero-shot manner. We empirically show that our simple approach achieves comparable performance to the latest methods in zero-shot cross-domain transfer. We also observe that our method performs better than other approaches in transfer between domains with different complexities, whereas other methods fail catastrophically.

Cite

Text

Watahiki et al. "Learning a Domain-Agnostic Policy Through Adversarial Representation Matching for Cross-Domain Policy Transfer." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Watahiki et al. "Learning a Domain-Agnostic Policy Through Adversarial Representation Matching for Cross-Domain Policy Transfer." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/watahiki2022neuripsw-learning/)

BibTeX

@inproceedings{watahiki2022neuripsw-learning,
  title     = {{Learning a Domain-Agnostic Policy Through Adversarial Representation Matching for Cross-Domain Policy Transfer}},
  author    = {Watahiki, Hayato and Iwase, Ryo and Unno, Ryosuke and Tsuruoka, Yoshimasa},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/watahiki2022neuripsw-learning/}
}