Learning Team Strategies: Soccer Case Studies

Abstract

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

Cite

Text

Salustowicz et al. "Learning Team Strategies: Soccer Case Studies." Machine Learning, 1998. doi:10.1023/A:1007570708568

Markdown

[Salustowicz et al. "Learning Team Strategies: Soccer Case Studies." Machine Learning, 1998.](https://mlanthology.org/mlj/1998/salustowicz1998mlj-learning/) doi:10.1023/A:1007570708568

BibTeX

@article{salustowicz1998mlj-learning,
  title     = {{Learning Team Strategies: Soccer Case Studies}},
  author    = {Salustowicz, Rafal and Wiering, Marco A. and Schmidhuber, Jürgen},
  journal   = {Machine Learning},
  year      = {1998},
  pages     = {263-282},
  doi       = {10.1023/A:1007570708568},
  volume    = {33},
  url       = {https://mlanthology.org/mlj/1998/salustowicz1998mlj-learning/}
}