6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation
Abstract
Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability we propose a novel diffusion-based framework (6D-Diff) to handle the noise and indeterminacy in object pose estimation for better performance. In our framework to establish accurate 2D-3D correspondence we formulate 2D keypoints detection as a reverse diffusion (denoising) process. To facilitate such a denoising process we design a Mixture-of-Cauchy-based forward diffusion process and condition the reverse process on the object appearance features. Extensive experiments on the LM-O and YCB-V datasets demonstrate the effectiveness of our framework.
Cite
Text
Xu et al. "6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00924Markdown
[Xu et al. "6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/xu2024cvpr-6ddiff/) doi:10.1109/CVPR52733.2024.00924BibTeX
@inproceedings{xu2024cvpr-6ddiff,
title = {{6d-Diff: A Keypoint Diffusion Framework for 6d Object Pose Estimation}},
author = {Xu, Li and Qu, Haoxuan and Cai, Yujun and Liu, Jun},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {9676-9686},
doi = {10.1109/CVPR52733.2024.00924},
url = {https://mlanthology.org/cvpr/2024/xu2024cvpr-6ddiff/}
}