Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions

Abstract

We study the use of the reinforcement learning algorithm Q-learning with regression tree function approximation to learn pricing strategies in a competitive marketplace of economic software agents. Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. In the case of a stationary environment with a lookup table representing the Q-function, the learning procedure is guaranteed to converge to an optimal policy. However, utilizing Q-learning in multi-agent systems presents special challenges. The simultaneous adaptation of multiple agents creates a non-stationary environment for each agent, hence there are no theoretical guarantees of convergence or optimality. Also, large multi-agent systems may have state spaces too large to represent with lookup tables, necessitating the use of function approximation.

Cite

Text

Sridharan and Tesauro. "Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions." International Conference on Machine Learning, 2000. doi:10.1109/ICMAS.2000.858518

Markdown

[Sridharan and Tesauro. "Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/sridharan2000icml-multi/) doi:10.1109/ICMAS.2000.858518

BibTeX

@inproceedings{sridharan2000icml-multi,
  title     = {{Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions}},
  author    = {Sridharan, Manu and Tesauro, Gerald},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {927-934},
  doi       = {10.1109/ICMAS.2000.858518},
  url       = {https://mlanthology.org/icml/2000/sridharan2000icml-multi/}
}