Pseudo-Convergent Q-Learning by Competitive Pricebots
Abstract
We study novel aspects of multi-agent Qlearning in a model market in which two identical, competing "pricebots" strategically price a commodity. Two fundamentally different solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudosolution, with small but non-zero Bellman error. This "pseudo-convergent" asymmetric solution has no analog in ordinary Qlearning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research. 1. Introduction Within the next few years, we expect electronic commerce to be an important multi-agent domain in which reinforcement learning will find numerous applications. One such application is automated dynamic pricing...
Cite
Text
Kephart and Tesauro. "Pseudo-Convergent Q-Learning by Competitive Pricebots." International Conference on Machine Learning, 2000.Markdown
[Kephart and Tesauro. "Pseudo-Convergent Q-Learning by Competitive Pricebots." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/kephart2000icml-pseudo/)BibTeX
@inproceedings{kephart2000icml-pseudo,
title = {{Pseudo-Convergent Q-Learning by Competitive Pricebots}},
author = {Kephart, Jeffrey O. and Tesauro, Gerald},
booktitle = {International Conference on Machine Learning},
year = {2000},
pages = {463-470},
url = {https://mlanthology.org/icml/2000/kephart2000icml-pseudo/}
}