Reward Translation via Reward Machine in Semi-Alignable MDPs
Abstract
Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graph-matching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semi-alignable MDPs.
Cite
Text
Hua et al. "Reward Translation via Reward Machine in Semi-Alignable MDPs." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Hua et al. "Reward Translation via Reward Machine in Semi-Alignable MDPs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/hua2025icml-reward/)BibTeX
@inproceedings{hua2025icml-reward,
title = {{Reward Translation via Reward Machine in Semi-Alignable MDPs}},
author = {Hua, Yun and Chen, Haosheng and Li, Wenhao and Jin, Bo and Wang, Baoxiang and Zha, Hongyuan and Wang, Xiangfeng},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {24912-24931},
volume = {267},
url = {https://mlanthology.org/icml/2025/hua2025icml-reward/}
}