Understanding Physical Dynamics with Counterfactual World Modeling
Abstract
The ability to understand physical dynamics is critical for agents to act in the world. Here, we use Counterfactual World Modeling (CWM) to extract vision structures for dynamics understanding. CWM uses a temporally-factored masking policy for masked prediction of video data without annotations. This policy enables highly effective “counterfactual prompting” of the predictor, allowing a spectrum of visual structures to be extracted from a single pre-trained predictor without finetuning on annotated datasets. We demonstrate that these structures are useful for physical dynamics understanding, allowing CWM to achieve the state-of-the-art performance on the Physion benchmark. Code is available at https://neuroailab.github.io/cwm-physics/.
Cite
Text
Venkatesh et al. "Understanding Physical Dynamics with Counterfactual World Modeling." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72691-0_21Markdown
[Venkatesh et al. "Understanding Physical Dynamics with Counterfactual World Modeling." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/venkatesh2024eccv-understanding/) doi:10.1007/978-3-031-72691-0_21BibTeX
@inproceedings{venkatesh2024eccv-understanding,
title = {{Understanding Physical Dynamics with Counterfactual World Modeling}},
author = {Venkatesh, Rahul and Chen, Honglin and Feigelis, Kevin and Bear, Daniel M and Jedoui, Khaled and Kotar, Klemen and Binder, Felix J and Lee, Wanhee and Liu, Sherry and Smith, Kevin and Fan, Judith E. and Yamins, Daniel},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72691-0_21},
url = {https://mlanthology.org/eccv/2024/venkatesh2024eccv-understanding/}
}