Generalizable Single-View Object Pose Estimation by Two-Side Generating and Matching

Abstract

In this paper we present a novel generalizable object pose estimation method to determine the object pose using only one RGB image. Unlike traditional approaches that rely on instance-level object pose estimation and necessitate extensive training data our method offers generalization to unseen objects without extensive training operates with a single reference image of the object and eliminates the need for 3D object models or multiple views of the object. These characteristics are achieved by utilizing a diffusion model to generate novel-view images and conducting a two-sided matching on these generated images. Quantitative experiments demonstrate the superiority of our method over existing pose estimation techniques across both synthetic and real-world datasets. Remarkably our approach maintains strong performance even in scenarios with significant viewpoint changes highlighting its robustness and versatility in challenging conditions. The code will be released at https://github.com/scy639/Gen2SM

Cite

Text

Sun et al. "Generalizable Single-View Object Pose Estimation by Two-Side Generating and Matching." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Sun et al. "Generalizable Single-View Object Pose Estimation by Two-Side Generating and Matching." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/sun2025wacv-generalizable/)

BibTeX

@inproceedings{sun2025wacv-generalizable,
  title     = {{Generalizable Single-View Object Pose Estimation by Two-Side Generating and Matching}},
  author    = {Sun, Yujing and Sun, Caiyi and Liu, Yuan and Ma, Yuexin and Yiu, Siu Ming},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {545-556},
  url       = {https://mlanthology.org/wacv/2025/sun2025wacv-generalizable/}
}