Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Abstract
Estimating motion primitives from video (e.g., optical flow and occlusion) is a critically important computer vision problem with many downstream applications, including controllable video generation and robotics. Current solutions are primarily supervised on synthetic data or require tuning of situation-specific heuristics, which inherently limits these models' capabilities in real-world contexts. A natural solution to transcend these limitations would be to deploy large-scale, self-supervised video models, which can be trained scalably on unrestricted real-world video datasets. However, despite recent progress, motion-primitive extraction from large pretrained video models remains relatively underexplored. In this work, we describe Opt-CWM, a self-supervised flow and occlusion estimation technique from a pretrained video prediction model. Opt-CWM uses ``counterfactual probes'' to extract motion information from a base video model in a zero-shot fashion. The key problem we solve is optimizing the quality of these probes, using a combination of an efficient parameterization of the space counterfactual probes, together with a novel generic sparse-prediction principle for learning the probe-generation parameters in a self-supervised fashion. Opt-CWM achieves state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data.
Cite
Text
Stojanov et al. "Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals." Advances in Neural Information Processing Systems, 2025.Markdown
[Stojanov et al. "Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/stojanov2025neurips-selfsupervised/)BibTeX
@inproceedings{stojanov2025neurips-selfsupervised,
title = {{Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals}},
author = {Stojanov, Stefan and Wendt, David and Kim, Seungwoo and Venkatesh, Rahul Mysore and Feigelis, Kevin and Kotar, Klemen and Aw, Khai Loong and Wu, Jiajun and Yamins, Daniel LK},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/stojanov2025neurips-selfsupervised/}
}