CoMotion: Concurrent Multi-Person 3D Motion

Abstract

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time.

Cite

Text

Newell et al. "CoMotion: Concurrent Multi-Person 3D Motion." International Conference on Learning Representations, 2025.

Markdown

[Newell et al. "CoMotion: Concurrent Multi-Person 3D Motion." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/newell2025iclr-comotion/)

BibTeX

@inproceedings{newell2025iclr-comotion,
  title     = {{CoMotion: Concurrent Multi-Person 3D Motion}},
  author    = {Newell, Alejandro and Hu, Peiyun and Lipson, Lahav and Richter, Stephan and Koltun, Vladlen},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/newell2025iclr-comotion/}
}