Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans

Abstract

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence -- especially in humans -- as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rule-based vs. incremental learning.

Cite

Text

Simon and Daw. "Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans." Neural Information Processing Systems, 2011.

Markdown

[Simon and Daw. "Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/simon2011neurips-environmental/)

BibTeX

@inproceedings{simon2011neurips-environmental,
  title     = {{Environmental Statistics and the Trade-Off Between Model-Based and TD Learning in Humans}},
  author    = {Simon, Dylan A. and Daw, Nathaniel D.},
  booktitle = {Neural Information Processing Systems},
  year      = {2011},
  pages     = {127-135},
  url       = {https://mlanthology.org/neurips/2011/simon2011neurips-environmental/}
}