A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold'em Poker
Abstract
We point out that value-based reinforcement learning, such as TDand Q-learning, is not applicable to games of imperfect information. We give a reinforcement learning algorithm for two-player poker based on gradient search in the agents’ parameter spaces. The two competing agents experiment with different strategies, and simultaneously shift their probability distributions towards more successful actions. The algorithm is a special case of the lagging anchor algorithm, to appear in the journal Machine Learning . We test the algorithm on a simplified, yet non-trivial, version of two-player Hold’em poker, with good results.
Cite
Text
Dahl. "A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold'em Poker." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_8Markdown
[Dahl. "A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold'em Poker." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/dahl2001ecml-reinforcement/) doi:10.1007/3-540-44795-4_8BibTeX
@inproceedings{dahl2001ecml-reinforcement,
title = {{A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold'em Poker}},
author = {Dahl, Fredrik A.},
booktitle = {European Conference on Machine Learning},
year = {2001},
pages = {85-96},
doi = {10.1007/3-540-44795-4_8},
url = {https://mlanthology.org/ecmlpkdd/2001/dahl2001ecml-reinforcement/}
}