Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning

Abstract

Guiding the policy of multi-agent reinforcement learning to align with human common sense is a difficult problem, largely due to the complexity of modeling common sense as a reward, especially in complex and long-horizon multi-agent tasks. Recent works have shown the effectiveness of reward shaping, such as potential-based rewards, to enhance policy alignment. The existing works, however, primarily rely on experts to design rule-based rewards, which are often labor-intensive and lack a high-level semantic understanding of common sense. To solve this problem, we propose a hierarchical vision-based reward shaping method. At the bottom layer, a visual-language model (VLM) serves as a generic potential function, guiding the policy to align with human common sense through its intrinsic semantic understanding. To help the policy adapts to uncertainty and changes in long-horizon tasks, the top layer features an adaptive skill selection module based on a visual large language model (vLLM). The module uses instructions, video replays, and training records to dynamically select suitable potential function from a pre-designed pool. Besides, our method is theoretically proven to preserve the optimal policy. Extensive experiments conducted in the Google Research Football environment demonstrate that our method not only achieves a higher win rate but also effectively aligns the policy with human common sense.

Cite

Text

Ma et al. "Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I18.34123

Markdown

[Ma et al. "Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/ma2025aaai-vision/) doi:10.1609/AAAI.V39I18.34123

BibTeX

@inproceedings{ma2025aaai-vision,
  title     = {{Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning}},
  author    = {Ma, Hao and Wang, Shijie and Pu, Zhiqiang and Zhao, Siyao and Ai, Xiaolin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {19287-19295},
  doi       = {10.1609/AAAI.V39I18.34123},
  url       = {https://mlanthology.org/aaai/2025/ma2025aaai-vision/}
}