Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans
Abstract
There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence -- especially in humans -- as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rule-based vs. incremental learning.
Cite
Text
Simon and Daw. "Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans." Neural Information Processing Systems, 2011.Markdown
[Simon and Daw. "Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/simon2011neurips-environmental/)BibTeX
@inproceedings{simon2011neurips-environmental,
title = {{Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans}},
author = {Simon, Dylan A. and Daw, Nathaniel D.},
booktitle = {Neural Information Processing Systems},
year = {2011},
pages = {127-135},
url = {https://mlanthology.org/neurips/2011/simon2011neurips-environmental/}
}