Explanation-Based Learning and Reinforcement Learning: A Unified View
Abstract
In speedup-learning problems, where full descriptions of operators are always known, both explanation-based learning (EBL) and reinforcement learning (RL) can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. RL performs this propagation on a state-by-state basis, while EBL computes the weakest preconditions of operators, and hence, performs this propagation on a region-by-region basis. Based on the observation that RL is a form of asynchronous dynamic programming, this paper shows how to develop a dynamic programming version of EBL, which we call Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of RL and to standard EBL. The results show that EBRL combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of RL* (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.
Cite
Text
Dietterich and Flann. "Explanation-Based Learning and Reinforcement Learning: A Unified View." International Conference on Machine Learning, 1995. doi:10.1016/B978-1-55860-377-6.50030-XMarkdown
[Dietterich and Flann. "Explanation-Based Learning and Reinforcement Learning: A Unified View." International Conference on Machine Learning, 1995.](https://mlanthology.org/icml/1995/dietterich1995icml-explanation/) doi:10.1016/B978-1-55860-377-6.50030-XBibTeX
@inproceedings{dietterich1995icml-explanation,
title = {{Explanation-Based Learning and Reinforcement Learning: A Unified View}},
author = {Dietterich, Thomas G. and Flann, Nicholas S.},
booktitle = {International Conference on Machine Learning},
year = {1995},
pages = {176-184},
doi = {10.1016/B978-1-55860-377-6.50030-X},
url = {https://mlanthology.org/icml/1995/dietterich1995icml-explanation/}
}