Physics-Based Human Pose Estimation from a Single Moving RGB Camera

Abstract

Most monocular and physics-based human pose tracking methods, while achieving state-of-the-art results, suffer from artifacts when the scene does not have a strictly flat ground plane or when the camera is moving. Moreover, these methods are often evaluated on in-the-wild real world videos without ground-truth data or on synthetic datasets, which fail to model the real world light transport, camera motion, and pose-induced appearance and geometry changes. To tackle these two problems, we introduce MoviCam, the first non-synthetic dataset containing ground-truth camera trajectories of a dynamically moving monocular RGB camera, scene geometry, and 3D human motion with foot contact labels. Additionally, we propose PhysDynPose, a physics-based method that incorporates scene geometry and physical constraints for more accurate human motion tracking in case of camera motion and non-flat scenes. More precisely, we use a state-of-the-art kinematics estimator to obtain the human pose and a robust SLAM method to capture the dynamic camera trajectory, enabling the recovery of the human pose in the world frame. We then refine the kinematic pose estimate using our scene-aware physics optimizer. From our new benchmark, we found that even state-of-the-art methods struggle with this inherently challenging setting, i.e. a moving camera and non-planar environments, while our method robustly estimates both human and camera poses in world coordinates.

Cite

Text

Aytekin et al. "Physics-Based Human Pose Estimation from a Single Moving RGB Camera." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Aytekin et al. "Physics-Based Human Pose Estimation from a Single Moving RGB Camera." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/aytekin2025cvprw-physicsbased/)

BibTeX

@inproceedings{aytekin2025cvprw-physicsbased,
  title     = {{Physics-Based Human Pose Estimation from a Single Moving RGB Camera}},
  author    = {Aytekin, Ayce Idil and Li, Chuqiao and Luvizon, Diogo C. and Dabral, Rishabh and Oswald, Martin R. and Habermann, Marc and Theobalt, Christian},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3891-3900},
  url       = {https://mlanthology.org/cvprw/2025/aytekin2025cvprw-physicsbased/}
}