Children Prioritize Purely Exploratory Actions in Observe-Vs.-Bet Tasks
Abstract
In reinforcement learning, agents often need to decide between selecting actions that are familiar and have previously yielded positive results (exploitation), and seeking new information that could allow them to uncover more effective actions (exploration). Understanding the specific kinds of heuristics and strategies that humans employ to solve this problem over the course of their development remains an open question in cognitive science and AI. In this study we develop an "observe or bet" task that separates "pure exploration” from "pure exploitation.” Participants have the option to either observe an instance of an outcome and receive no reward, or to bet on an action that is eventually rewarding, but offers no immediate feedback. We collected data from 56 five-to-seven-year-old children who completed the task at one of three different probability levels. We compared how children performed against both approximate solutions to the partially-observable Markov decision process and meta-RL models that were meta trained on the same decision making task across different probability levels. We found that the children observe significantly more than the two classes of algorithms. We then quantified how children’s policies differ between the different probability levels by fitting probabilistic programming models and by calculating the likelihood of the children’s actions under the task-driven model. The fitted parameters of the behavioral model as well as the direction of the deviation from neural network policies demonstrate that the primary way children change the frequency with which they bet on the door for which they have less evidence. This suggests both that children model the causal structure of the environment and that they produce a “hedging behavior” that would be impossible to detect in standard bandit tasks, and that reduces variance in overall rewards. The results shed light on how children reason about reward and information, providing a developmental benchmark that can help shape our understanding of both human behavior and RL neural network models.
Cite
Text
Yiu et al. "Children Prioritize Purely Exploratory Actions in Observe-Vs.-Bet Tasks." NeurIPS 2023 Workshops: IMOL, 2023.Markdown
[Yiu et al. "Children Prioritize Purely Exploratory Actions in Observe-Vs.-Bet Tasks." NeurIPS 2023 Workshops: IMOL, 2023.](https://mlanthology.org/neuripsw/2023/yiu2023neuripsw-children/)BibTeX
@inproceedings{yiu2023neuripsw-children,
title = {{Children Prioritize Purely Exploratory Actions in Observe-Vs.-Bet Tasks}},
author = {Yiu, Eunice and Sandbrink, Kai and Gopnik, Alison},
booktitle = {NeurIPS 2023 Workshops: IMOL},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/yiu2023neuripsw-children/}
}