Meta-Trained Agents Implement Bayes-Optimal Agents

Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro Ortega

NeurIPS 2020

/neurips/2020/mikulik2020neurips-metatrained/

Abstract

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning is a general technique for numerically approximating Bayes-optimal agents; that is, even for task distributions for which we currently don't possess tractable models.

PDF NeurIPS Semantic Scholar

Cite

Text

Mikulik et al. "Meta-Trained Agents Implement Bayes-Optimal Agents." Neural Information Processing Systems, 2020.

Markdown

[Mikulik et al. "Meta-Trained Agents Implement Bayes-Optimal Agents." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/mikulik2020neurips-metatrained/)

BibTeX

@inproceedings{mikulik2020neurips-metatrained,
  title     = {{Meta-Trained Agents Implement Bayes-Optimal Agents}},
  author    = {Mikulik, Vladimir and Delétang, Grégoire and McGrath, Tom and Genewein, Tim and Martic, Miljan and Legg, Shane and Ortega, Pedro},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/mikulik2020neurips-metatrained/}
}