Reward Translation via Reward Machine in Semi-Alignable MDPs

Abstract

Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define reward translation to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs. This paper presents an adaptable reward translation framework neural reward translation featuring semi-alignable MDPs, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning. Graph-matching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach’s effectiveness in tasks under environments with semi-alignable MDPs.

Cite

Text

Hua et al. "Reward Translation via Reward Machine in Semi-Alignable MDPs." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Hua et al. "Reward Translation via Reward Machine in Semi-Alignable MDPs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/hua2025icml-reward/)

BibTeX

@inproceedings{hua2025icml-reward,
  title     = {{Reward Translation via Reward Machine in Semi-Alignable MDPs}},
  author    = {Hua, Yun and Chen, Haosheng and Li, Wenhao and Jin, Bo and Wang, Baoxiang and Zha, Hongyuan and Wang, Xiangfeng},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {24912-24931},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/hua2025icml-reward/}
}