Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning

Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li

AAAI 2024 pp. 17202-17210

doi:10.1609/AAAI.V38I15.29666 /aaai/2024/zhu2024aaai-decoding/

Abstract

Designing accurate reward functions for reinforcement learning (RL) has long been challenging. Preference-based RL (PbRL) offers a promising approach by using human preferences to train agents, eliminating the need for manual reward design. While successful in single-agent tasks, extending PbRL to complex multi-agent scenarios is nontrivial. Existing PbRL methods lack the capacity to comprehensively capture both temporal and cooperative aspects, leading to inadequate reward functions. This work introduces an advanced multi-agent preference learning framework that effectively addresses these limitations. Based on a cascading Transformer architecture, our approach captures both temporal and cooperative dependencies, alleviating issues related to reward uniformity and intricate interactions among agents. Experimental results demonstrate substantial performance improvements in multi-agent cooperative tasks, and the reconstructed reward function closely resembles expert-defined reward functions. The source code is available at https://github.com/catezi/MAPT.

PDF AAAI Semantic Scholar

Cite

Text

Zhu et al. "Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I15.29666

Markdown

[Zhu et al. "Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/zhu2024aaai-decoding/) doi:10.1609/AAAI.V38I15.29666

BibTeX

@inproceedings{zhu2024aaai-decoding,
  title     = {{Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning}},
  author    = {Zhu, Tianchen and Qiu, Yue and Zhou, Haoyi and Li, Jianxin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {17202-17210},
  doi       = {10.1609/AAAI.V38I15.29666},
  url       = {https://mlanthology.org/aaai/2024/zhu2024aaai-decoding/}
}