A Multi-Agent Policy-Gradient Approach to Network Routing
Abstract
Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.
Cite
Text
Tao et al. "A Multi-Agent Policy-Gradient Approach to Network Routing." International Conference on Machine Learning, 2001. doi:10.48550/arXiv.2512.03211Markdown
[Tao et al. "A Multi-Agent Policy-Gradient Approach to Network Routing." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/tao2001icml-multi/) doi:10.48550/arXiv.2512.03211BibTeX
@inproceedings{tao2001icml-multi,
title = {{A Multi-Agent Policy-Gradient Approach to Network Routing}},
author = {Tao, Nigel and Baxter, Jonathan and Weaver, Lex},
booktitle = {International Conference on Machine Learning},
year = {2001},
pages = {553-560},
doi = {10.48550/arXiv.2512.03211},
url = {https://mlanthology.org/icml/2001/tao2001icml-multi/}
}