CoMotion: Concurrent Multi-Person 3D Motion
Abstract
We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time.
Cite
Text
Newell et al. "CoMotion: Concurrent Multi-Person 3D Motion." International Conference on Learning Representations, 2025.Markdown
[Newell et al. "CoMotion: Concurrent Multi-Person 3D Motion." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/newell2025iclr-comotion/)BibTeX
@inproceedings{newell2025iclr-comotion,
title = {{CoMotion: Concurrent Multi-Person 3D Motion}},
author = {Newell, Alejandro and Hu, Peiyun and Lipson, Lahav and Richter, Stephan and Koltun, Vladlen},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/newell2025iclr-comotion/}
}