Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming

NeurIPS 1990 pp. 471-478

/neurips/1990/sutton1990neurips-integrated/

Abstract

This is a summary of results with Dyna, a class of architectures for intel(cid:173) ligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.

PDF NeurIPS Semantic Scholar

Cite

Text

Sutton. "Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming." Neural Information Processing Systems, 1990.

Markdown

[Sutton. "Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming." Neural Information Processing Systems, 1990.](https://mlanthology.org/neurips/1990/sutton1990neurips-integrated/)

BibTeX

@inproceedings{sutton1990neurips-integrated,
  title     = {{Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming}},
  author    = {Sutton, Richard S.},
  booktitle = {Neural Information Processing Systems},
  year      = {1990},
  pages     = {471-478},
  url       = {https://mlanthology.org/neurips/1990/sutton1990neurips-integrated/}
}