Learning to Play Hearts

Abstract

The success of neural networks and temporal dif-ference methods in complex tasks such as in (Tesauro 1992) provides the opportunity to apply these meth-ods in other game playing domains. I compared two learning architectures: supervised learning and tempo-ral difference learning for the game of hearts. Supervised Learning Framework. This ver-sion employs a supervised learning algorithm. There are four evaluating networks, one for each suit. If a suit is legal to play, the corresponding network is evaluated on all legal plays in this suit. The card with the high-est evaluation is returned as the best candidate in the suit. If several suits could be played then the card with the highest value across all suits is selected. Once the trick is completed, we calculate its value by summing all point-bearing cards in the trick. The neural net-work is updated with the target value being the trick value: E = V&+k- Q(scur, acur) where Q(scur, acur) is the output of the network for the current state and action. After we calculated the error, the standard back propagation procedure (Rumelhart, Hinton, & Williams 1986) is applied. It moves the weights of the network in the error minimizing direction. 1

Cite

Text

Kuvayev. "Learning to Play Hearts." AAAI Conference on Artificial Intelligence, 1997.

Markdown

[Kuvayev. "Learning to Play Hearts." AAAI Conference on Artificial Intelligence, 1997.](https://mlanthology.org/aaai/1997/kuvayev1997aaai-learning/)

BibTeX

@inproceedings{kuvayev1997aaai-learning,
  title     = {{Learning to Play Hearts}},
  author    = {Kuvayev, Leo},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1997},
  pages     = {836},
  url       = {https://mlanthology.org/aaai/1997/kuvayev1997aaai-learning/}
}