Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

Abstract

We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence globally without any segments. We progressively optimize the object shape and pose simultaneously based on an implicit neural representation. A key aspect of our method is a virtual camera system that reduces the search space of the optimization significantly. We evaluate our method on the standard HO3D dataset and a collection of egocentric RGB sequences captured with a head-mounted device. We demonstrate that our approach outperforms most methods significantly, and is on par with recent techniques that assume prior information.

Cite

Text

Shi et al. "Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32736

Markdown

[Shi et al. "Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/shi2025aaai-free/) doi:10.1609/AAAI.V39I7.32736

BibTeX

@inproceedings{shi2025aaai-free,
  title     = {{Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera}},
  author    = {Shi, Haixin and Hu, Yinlin and Koguciuk, Daniel and Lin, Juan-Ting and Salzmann, Mathieu and Ferstl, David},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {6860-6868},
  doi       = {10.1609/AAAI.V39I7.32736},
  url       = {https://mlanthology.org/aaai/2025/shi2025aaai-free/}
}