A Multi-Agent Policy-Gradient Approach to Network Routing

Abstract

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.

Cite

Text

Tao et al. "A Multi-Agent Policy-Gradient Approach to Network Routing." International Conference on Machine Learning, 2001. doi:10.48550/arXiv.2512.03211

Markdown

[Tao et al. "A Multi-Agent Policy-Gradient Approach to Network Routing." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/tao2001icml-multi/) doi:10.48550/arXiv.2512.03211

BibTeX

@inproceedings{tao2001icml-multi,
  title     = {{A Multi-Agent Policy-Gradient Approach to Network Routing}},
  author    = {Tao, Nigel and Baxter, Jonathan and Weaver, Lex},
  booktitle = {International Conference on Machine Learning},
  year      = {2001},
  pages     = {553-560},
  doi       = {10.48550/arXiv.2512.03211},
  url       = {https://mlanthology.org/icml/2001/tao2001icml-multi/}
}