Multi-Object System Identification from Videos

Abstract

We introduce the challenging problem of multi-object system identification from videos, for which prior methods are ill-suited due to their focus on single-object scenes or discrete material classification with a fixed set of material prototypes. To address this, we propose MOSIV, a new framework that directly optimizes for continuous, per-object material parameters using a differentiable simulator guided by geometric objectives derived from video. We also present a new synthetic benchmark with contact-rich, multi-object interactions to facilitate evaluation. On this benchmark, MOSIV substantially improves grounding accuracy and long-horizon simulation fidelity over adapted baselines, establishing it as a strong baseline for this new task. Our analysis shows that object-level fine-grained supervision and geometry-aligned objectives are critical for stable optimization in these complex, multi-object settings. The source code and dataset will be released.

Cite

Text

Liu et al. "Multi-Object System Identification from Videos." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "Multi-Object System Identification from Videos." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-multiobject/)

BibTeX

@inproceedings{liu2026iclr-multiobject,
  title     = {{Multi-Object System Identification from Videos}},
  author    = {Liu, Chunjiang and Wang, Xiaoyuan and Lin, Qingran and Xiao, Albert and Chen, Haoyu and Wen, Shizheng and Zhang, Hao and Qi, Lu and Yang, Ming-Hsuan and Jeni, Laszlo A. and Xu, Min and Zhao, Yizhou},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-multiobject/}
}