DPM: Dual Preferences-Based Multi-Agent Reinforcement Learning
Abstract
Multi-agent reinforcement learning (MARL) has demonstrated strong performance across various domains but still faces challenges in sparse reward environments. Preference-based Reinforcement Learning (PbRL) offers a promising solution by leveraging human preferences to transform sparse rewards into dense ones. However, its application in MARL remains under-explored. We propose Dual Preferences-based Multi-Agent Reinforcement Learning (DPM), which extends PbRL to MARL by introducing preferences comparing not only trajectories but also individual agent contributions. Moreover, the research introduces a novel method taking advantage of Large Language Models (LLMs) to gather preferences, addressing challenges associated with human-based preference collection. Experimental results in the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate significant performance improvements over baselines, indicating the efficacy of DPM in optimizing individual reward functions and enhancing performances in sparse reward settings.
Cite
Text
Kang et al. "DPM: Dual Preferences-Based Multi-Agent Reinforcement Learning." ICML 2024 Workshops: MFHAIA, 2024.Markdown
[Kang et al. "DPM: Dual Preferences-Based Multi-Agent Reinforcement Learning." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/kang2024icmlw-dpm/)BibTeX
@inproceedings{kang2024icmlw-dpm,
title = {{DPM: Dual Preferences-Based Multi-Agent Reinforcement Learning}},
author = {Kang, Sehyeok and Lee, Yongsik and Yun, Se-Young},
booktitle = {ICML 2024 Workshops: MFHAIA},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/kang2024icmlw-dpm/}
}