QUICR-Learning for Multi-Agent Coordination

Agogino, Adrian K.; Tumer, Kagan

QUICR-Learning for Multi-Agent Coordination

AAAI 2006 pp. 1438-1443

/aaai/2006/agogino2006aaai-quicr/

Abstract

Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t ′&gt; t. Second, credit must be assigned for the contribution of agent i to the overall system perfor-mance. The first credit assignment problem is typically ad-dressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the “Q Updates with Immediate Counterfactual Rewards-learning” (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the col-lectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfac-tual) rewards. In addition QUICR-learning provides signifi-cant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the im-provements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods.

PDF AAAI Semantic Scholar

Cite

Text

Agogino and Tumer. "QUICR-Learning for Multi-Agent Coordination." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Agogino and Tumer. "QUICR-Learning for Multi-Agent Coordination." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/agogino2006aaai-quicr/)

BibTeX

@inproceedings{agogino2006aaai-quicr,
  title     = {{QUICR-Learning for Multi-Agent Coordination}},
  author    = {Agogino, Adrian K. and Tumer, Kagan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1438-1443},
  url       = {https://mlanthology.org/aaai/2006/agogino2006aaai-quicr/}
}