Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction
Abstract
Multi-agent reinforcement learning faces greater challenges with efficient exploration compared to single-agent counterparts, primarily due to the exponential growth in state and action spaces. Methods based on intrinsic rewards have been proven to enhance exploration efficiency in multi-agent scenarios effectively. However, these methods are plagued by instability during training and biases in exploration direction. To address these challenges, we propose Farsighted Self-Direction (FSD), a novel model-free method that utilizes a long-term exploration bonus to achieve coordinated exploration. Since prediction error against individual Q-values indicates a potential bonus for committed exploration, it is taken into account in action selection to directly guide the coordinated exploration. Further, we also use clipped double Q-learning to reduce noise in prediction error. We validate the method on didactic examples and demonstrate the outperformance of our method on challenging StarCraft II micromanagement tasks.
Cite
Text
Lao et al. "Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction." Transactions on Machine Learning Research, 2025.Markdown
[Lao et al. "Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/lao2025tmlr-efficient/)BibTeX
@article{lao2025tmlr-efficient,
title = {{Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction}},
author = {Lao, Tiancheng and Guo, Xudong and Liu, Mengge and Yu, Junjie and Liu, Yi and Fan, Wenhui},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/lao2025tmlr-efficient/}
}