Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding

ICML 2025 pp. 69967-69991

/icml/2025/xu2025icml-metareinforcement/

Abstract

This paper studies meta-reinforcement learning with adaptation from human feedback. It aims to pre-train a meta-model that can achieve few-shot adaptation for new tasks from human preference queries without relying on reward signals. To solve the problem, we propose the framework adaptation via Preference-Order-preserving EMbedding (POEM). In the meta-training, the framework learns a task encoder, which maps tasks to a preference-order-preserving task embedding space, and a decoder, which maps the embeddings to the task-specific policies. In the adaptation from human feedback, the task encoder facilitates efficient task embedding inference for new tasks from the preference queries and then obtains the task-specific policy. We provide a theoretical guarantee for the convergence of the adaptation process to the task-specific optimal policy and experimentally demonstrate its state-of-the-art performance with substantial improvement over baseline methods.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Xu and Zhu. "Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Xu and Zhu. "Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/xu2025icml-metareinforcement/)

BibTeX

@inproceedings{xu2025icml-metareinforcement,
  title     = {{Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding}},
  author    = {Xu, Siyuan and Zhu, Minghui},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {69967-69991},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/xu2025icml-metareinforcement/}
}