Transframer: Arbitrary Frame Prediction with Generative Models

Abstract

We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term \modelname, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data.

Cite

Text

Nash et al. "Transframer: Arbitrary Frame Prediction with Generative Models." Transactions on Machine Learning Research, 2023.

Markdown

[Nash et al. "Transframer: Arbitrary Frame Prediction with Generative Models." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/nash2023tmlr-transframer/)

BibTeX

@article{nash2023tmlr-transframer,
  title     = {{Transframer: Arbitrary Frame Prediction with Generative Models}},
  author    = {Nash, Charlie and Carreira, Joao and Walker, Jacob C and Barr, Iain and Jaegle, Andrew and Malinowski, Mateusz and Battaglia, Peter},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/nash2023tmlr-transframer/}
}