TD-MPC2: Scalable, Robust World Models for Continuous Control

Abstract

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc2

Cite

Text

Hansen et al. "TD-MPC2: Scalable, Robust World Models for Continuous Control." NeurIPS 2023 Workshops: FMDM, 2023.

Markdown

[Hansen et al. "TD-MPC2: Scalable, Robust World Models for Continuous Control." NeurIPS 2023 Workshops: FMDM, 2023.](https://mlanthology.org/neuripsw/2023/hansen2023neuripsw-tdmpc2/)

BibTeX

@inproceedings{hansen2023neuripsw-tdmpc2,
  title     = {{TD-MPC2: Scalable, Robust World Models for Continuous Control}},
  author    = {Hansen, Nicklas and Su, Hao and Wang, Xiaolong},
  booktitle = {NeurIPS 2023 Workshops: FMDM},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/hansen2023neuripsw-tdmpc2/}
}