6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation

Abstract

Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability we propose a novel diffusion-based framework (6D-Diff) to handle the noise and indeterminacy in object pose estimation for better performance. In our framework to establish accurate 2D-3D correspondence we formulate 2D keypoints detection as a reverse diffusion (denoising) process. To facilitate such a denoising process we design a Mixture-of-Cauchy-based forward diffusion process and condition the reverse process on the object appearance features. Extensive experiments on the LM-O and YCB-V datasets demonstrate the effectiveness of our framework.

Cite

Text

Xu et al. "6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00924

Markdown

[Xu et al. "6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/xu2024cvpr-6ddiff/) doi:10.1109/CVPR52733.2024.00924

BibTeX

@inproceedings{xu2024cvpr-6ddiff,
  title     = {{6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation}},
  author    = {Xu, Li and Qu, Haoxuan and Cai, Yujun and Liu, Jun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {9676-9686},
  doi       = {10.1109/CVPR52733.2024.00924},
  url       = {https://mlanthology.org/cvpr/2024/xu2024cvpr-6ddiff/}
}