Learning In-Context Decision Making with Synthetic MDPs

Abstract

Current AI models are trained on huge datasets of real world data. This is increasingly true in RL, with generalist agents being trained on data from hundreds of real environments. It is thought that real data/environments are the only way to capture the intricate complexities of real world RL tasks. In this paper, we challenge this notion by training generalist in-context decision making agents on only data generated by simple random processes. We investigate data generated from eight different families of synthetic environments ranging from Markov chains and bandits to discrete, continuous, and hybrid Markov decision processes (MDPs). Surprisingly, the resulting agents' performances are comparable to agents trained on real environment data. We additionally analyze what properties of the pretraining MDPs are ideal for creating good agents, thus giving RL practitioners insights on choosing which environments to train on.

Cite

Text

Kumar et al. "Learning In-Context Decision Making with Synthetic MDPs." ICML 2024 Workshops: AutoRL, 2024.

Markdown

[Kumar et al. "Learning In-Context Decision Making with Synthetic MDPs." ICML 2024 Workshops: AutoRL, 2024.](https://mlanthology.org/icmlw/2024/kumar2024icmlw-learning/)

BibTeX

@inproceedings{kumar2024icmlw-learning,
  title     = {{Learning In-Context Decision Making with Synthetic MDPs}},
  author    = {Kumar, Akarsh and Lu, Chris and Kirsch, Louis and Isola, Phillip},
  booktitle = {ICML 2024 Workshops: AutoRL},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/kumar2024icmlw-learning/}
}