Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions
Abstract
We study the use of the reinforcement learning algorithm Q-learning with regression tree function approximation to learn pricing strategies in a competitive marketplace of economic software agents. Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. In the case of a stationary environment with a lookup table representing the Q-function, the learning procedure is guaranteed to converge to an optimal policy. However, utilizing Q-learning in multi-agent systems presents special challenges. The simultaneous adaptation of multiple agents creates a non-stationary environment for each agent, hence there are no theoretical guarantees of convergence or optimality. Also, large multi-agent systems may have state spaces too large to represent with lookup tables, necessitating the use of function approximation.
Cite
Text
Sridharan and Tesauro. "Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions." International Conference on Machine Learning, 2000. doi:10.1109/ICMAS.2000.858518Markdown
[Sridharan and Tesauro. "Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/sridharan2000icml-multi/) doi:10.1109/ICMAS.2000.858518BibTeX
@inproceedings{sridharan2000icml-multi,
title = {{Multi-Agent Q-Learning and Regression Trees for Automated Pricing Decisions}},
author = {Sridharan, Manu and Tesauro, Gerald},
booktitle = {International Conference on Machine Learning},
year = {2000},
pages = {927-934},
doi = {10.1109/ICMAS.2000.858518},
url = {https://mlanthology.org/icml/2000/sridharan2000icml-multi/}
}