Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming
Abstract
This is a summary of results with Dyna, a class of architectures for intel(cid:173) ligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.
Cite
Text
Sutton. "Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming." Neural Information Processing Systems, 1990.Markdown
[Sutton. "Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming." Neural Information Processing Systems, 1990.](https://mlanthology.org/neurips/1990/sutton1990neurips-integrated/)BibTeX
@inproceedings{sutton1990neurips-integrated,
title = {{Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming}},
author = {Sutton, Richard S.},
booktitle = {Neural Information Processing Systems},
year = {1990},
pages = {471-478},
url = {https://mlanthology.org/neurips/1990/sutton1990neurips-integrated/}
}