Improving Reinforcement Learning Function Approximators via Neuroevolution

Abstract

Reinforcement learning problems are commonly tackled with temporal difference methods, which estimate the long-term value of taking each action in each state. In most problems of real-world interest, learning this value function requires a function approximator. However, the feasibility of using function approximators depends on the ability of the human designer to select an appropriate representation for the value function. My thesis presents a new approach to function approximation that automates some of these difficult design choices by coupling temporal difference methods with policy search methods such as evolutionary computation. It also presents a particular implementation which combines NEAT, a neuroevolutionary policy search method, and Q-learning, a popular temporal difference method, to yield a new method called NEAT+Q that automatically learns effective representations for neural network function approximators. Empirical results in a server job scheduling task demonstrate that NEAT+Q can outperform both NEAT and Q-learning with manually designed neural networks.

Cite

Text

Whiteson. "Improving Reinforcement Learning Function Approximators via Neuroevolution." AAAI Conference on Artificial Intelligence, 2005. doi:10.1145/1082473.1082794

Markdown

[Whiteson. "Improving Reinforcement Learning Function Approximators via Neuroevolution." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/whiteson2005aaai-improving/) doi:10.1145/1082473.1082794

BibTeX

@inproceedings{whiteson2005aaai-improving,
  title     = {{Improving Reinforcement Learning Function Approximators via Neuroevolution}},
  author    = {Whiteson, Shimon},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1666-1667},
  doi       = {10.1145/1082473.1082794},
  url       = {https://mlanthology.org/aaai/2005/whiteson2005aaai-improving/}
}