Programmatic Video Prediction Using Large Language Models

Abstract

The task of estimating the world model describing the dynamics of a real world process assumes immense importance for anticipating and preparing for future outcomes and finds wide-spread use in applications such as video surveillance, robotics applications, autonomous driving, etc. This task entails synthesizing plausible visual futures, given a few frames of a video - necessary to set the visual context for the synthesis. Towards this end, we propose ProgGen - which undertakes the task of video frame prediction by synthesizing computer programs which represent the dynamics of the video using a set of neuro-symbolic, human-interpretable set of states (one per frame) by leveraging the inductive biases of Large (Vision) Language Models (LLM/VLM). In particular, ProgGen utilizes LLM/VLM to synthesize computer programs to: (i) estimate the states of the video, given the visual context (i.e. the frames); (ii) predict the states corresponding to future time steps by estimating the transition dynamics; (iii) render the predicted states as visual RGB-frames. Empirical evaluations reveal that our proposed method outperforms competing techniques at the task of video frame prediction in two challenging environments: (i) PhyWorld and (ii) Cart Pole. Additionally, ProgGen permits counter-factual reasoning and editability attesting to its effectiveness and generalizability.

Cite

Text

Tang et al. "Programmatic Video Prediction Using Large Language Models." ICLR 2025 Workshops: World_Models, 2025.

Markdown

[Tang et al. "Programmatic Video Prediction Using Large Language Models." ICLR 2025 Workshops: World_Models, 2025.](https://mlanthology.org/iclrw/2025/tang2025iclrw-programmatic/)

BibTeX

@inproceedings{tang2025iclrw-programmatic,
  title     = {{Programmatic Video Prediction Using Large Language Models}},
  author    = {Tang, Hao and Ellis, Kevin and Lohit, Suhas and Jones, Michael J. and Chatterjee, Moitreya},
  booktitle = {ICLR 2025 Workshops: World_Models},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/tang2025iclrw-programmatic/}
}