Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations

Abstract

As humans increasingly share environments with diverse agents powered by RL, LLMs, and beyond, the ability to explain agent policies in natural language is vital for reliable coexistence. We introduce a general-purpose framework that trains explanation-generating LLMs via reinforcement learning from AI feedback, with distributional rewards generated by generative continuous normalizing flows (CNFs). CNFs capture the pluralistic and probabilistic nature of human judgments about explanations. Moreover, under mild assumptions, CNFs provably bound deviations from true human reward distributions when trained on noisy proxy rewards from LLMs. We design a specialized CNF architecture that selectively attends to linguistic cues in decision context and explanations when generating rewards. Human and LLM evaluators find that our method delivers explanations that enable more accurate predictions of true agent decisions, exhibit greater logical soundness and actionability, and impose lower cognitive load than explanations trained with proxy LLM rewards or state-of-the-art RLHF and RLAIF baselines.

Cite

Text

Yang et al. "Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations." International Conference on Learning Representations, 2026.

Markdown

[Yang et al. "Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yang2026iclr-translate/)

BibTeX

@inproceedings{yang2026iclr-translate,
  title     = {{Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations}},
  author    = {Yang, Xinyi and Zeng, Liang and Dong, Heng and Yu, Chao and Wu, Xiaoran and Yang, Huazhong and Wang, Yu and Tambe, Milind and Wang, Tonghan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/yang2026iclr-translate/}
}