Improving Reinforcement Learning Function Approximators via Neuroevolution
Abstract
Reinforcement learning problems are commonly tackled with temporal difference methods, which estimate the long-term value of taking each action in each state. In most problems of real-world interest, learning this value function requires a function approximator. However, the feasibility of using function approximators depends on the ability of the human designer to select an appropriate representation for the value function. My thesis presents a new approach to function approximation that automates some of these difficult design choices by coupling temporal difference methods with policy search methods such as evolutionary computation. It also presents a particular implementation which combines NEAT, a neuroevolutionary policy search method, and Q-learning, a popular temporal difference method, to yield a new method called NEAT+Q that automatically learns effective representations for neural network function approximators. Empirical results in a server job scheduling task demonstrate that NEAT+Q can outperform both NEAT and Q-learning with manually designed neural networks.
Cite
Text
Whiteson. "Improving Reinforcement Learning Function Approximators via Neuroevolution." AAAI Conference on Artificial Intelligence, 2005. doi:10.1145/1082473.1082794Markdown
[Whiteson. "Improving Reinforcement Learning Function Approximators via Neuroevolution." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/whiteson2005aaai-improving/) doi:10.1145/1082473.1082794BibTeX
@inproceedings{whiteson2005aaai-improving,
title = {{Improving Reinforcement Learning Function Approximators via Neuroevolution}},
author = {Whiteson, Shimon},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2005},
pages = {1666-1667},
doi = {10.1145/1082473.1082794},
url = {https://mlanthology.org/aaai/2005/whiteson2005aaai-improving/}
}