Disagreement Options: Task Adaptation Through Temporally Extended Actions

Hutsebaut-Buysse, Matthias; De Schepper, Tom; Mets, Kevin; Latré, Steven

doi:10.1007/978-3-030-86486-6_12

Disagreement Options: Task Adaptation Through Temporally Extended Actions

Matthias Hutsebaut-Buysse, Tom De Schepper, Kevin Mets, Steven Latré

ECML-PKDD 2021 pp. 190-205

doi:10.1007/978-3-030-86486-6_12 /ecmlpkdd/2021/hutsebautbuysse2021ecmlpkdd-disagreement/

Abstract

Embodied AI, learning through interaction with a physical environment, typically requires large amounts of interaction with the environment in order to learn how to solve new tasks. Training can be done in parallel, using simulated environments. However, once deployed in e.g., a real-world setting, it is not yet clear how an agent can quickly adapt its knowledge to solve new tasks. In this paper, we propose a novel Hierarchical Reinforcement Learning (HRL) method that allows an agent, when confronted with a novel task, to switch between exploiting prior knowledge through temporally extended actions, and environment exploration. We solve this trade-off by utilizing the disagreement between action distributions of selected previously acquired policies. Selection of relevant prior tasks is done by measuring the cosine similarity of their attached natural language goals in a pre-trained word-embedding. We analyze the resulting temporal abstractions, and we experimentally demonstrate the effectiveness of them in different environments. We show that our method is capable of solving new tasks using only a fraction of the environment interactions required when learning the task from scratch.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Hutsebaut-Buysse et al. "Disagreement Options: Task Adaptation Through Temporally Extended Actions." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86486-6_12

Markdown

[Hutsebaut-Buysse et al. "Disagreement Options: Task Adaptation Through Temporally Extended Actions." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/hutsebautbuysse2021ecmlpkdd-disagreement/) doi:10.1007/978-3-030-86486-6_12

BibTeX

@inproceedings{hutsebautbuysse2021ecmlpkdd-disagreement,
  title     = {{Disagreement Options: Task Adaptation Through Temporally Extended Actions}},
  author    = {Hutsebaut-Buysse, Matthias and De Schepper, Tom and Mets, Kevin and Latré, Steven},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {190-205},
  doi       = {10.1007/978-3-030-86486-6_12},
  url       = {https://mlanthology.org/ecmlpkdd/2021/hutsebautbuysse2021ecmlpkdd-disagreement/}
}