Online-to-Offline RL for Agent Alignment

Abstract

Reinforcement learning (RL) has shown remarkable success in training agents to achieve high-performing policies, particularly in domains like Game AI where simulation environments enable efficient interactions. However, despite their success in maximizing these returns, such online-trained policies often fail to align with human preferences concerning actions, styles, and values. The challenge lies in efficiently adapting these online-trained policies to align with human preferences, given the scarcity and high cost of collecting human behavior data. In this work, we formalize the problem as *online-to-offline* RL and propose ALIGNment of Game AI to Preferences (ALIGN-GAP), an innovative approach for the alignment of well-trained game agents to human preferences. Our method features a carefully designed reward model that encodes human preferences from limited offline data and incorporates curriculum-based preference learning to align RL agents with targeted human preferences. Experiments across diverse environments and preference types demonstrate the performance of ALIGN-GAP, achieving effective alignment with human preferences.

Cite

Text

Liu et al. "Online-to-Offline RL for Agent Alignment." International Conference on Learning Representations, 2025.

Markdown

[Liu et al. "Online-to-Offline RL for Agent Alignment." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/liu2025iclr-onlinetooffline/)

BibTeX

@inproceedings{liu2025iclr-onlinetooffline,
  title     = {{Online-to-Offline RL for Agent Alignment}},
  author    = {Liu, Xu and Fu, Haobo and Albrecht, Stefano V and Fu, Qiang and Li, Shuai},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/liu2025iclr-onlinetooffline/}
}