Omni6DPose: A Benchmark and Model for Universal 6d Object Pose Estimation and Tracking

Abstract

6D object pose estimation is crucial in the field of computer vision. However, it suffers from a significant lack of large-scale and diverse datasets, impeding comprehensive model evaluation and curtailing downstream applications. To address these issues, this paper introduces , a substantial benchmark featured by its diversity in object categories, large scale, and variety in object materials. is divided into three main components: (Real 6D Object Pose Estimation Dataset), which includes images annotated with over annotations across instances in categories; (Simulated 6D Object Pose Estimation Dataset), a simulated training set created by mixed reality and physics-based depth simulation; and (Pose Aligned 3D Models), the manually aligned real scanned objects used in and . is inherently challenging due to the substantial variations and ambiguities. To address this issue, we introduce , an enhanced version of the SOTA category-level 6D object pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this new large-scale dataset in the realms of 6D object pose estimation and pose tracking.

Cite

Text

Zhang et al. "Omni6DPose: A Benchmark and Model for Universal 6d Object Pose Estimation and Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73226-3_12

Markdown

[Zhang et al. "Omni6DPose: A Benchmark and Model for Universal 6d Object Pose Estimation and Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhang2024eccv-omni6dpose/) doi:10.1007/978-3-031-73226-3_12

BibTeX

@inproceedings{zhang2024eccv-omni6dpose,
  title     = {{Omni6DPose: A Benchmark and Model for Universal 6d Object Pose Estimation and Tracking}},
  author    = {Zhang, Jiyao and Huang, Weiyao and Peng, Bo and Wu, Mingdong and Hu, Fei and Chen, Zijian and Zhao, Bo and Dong, Hao},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73226-3_12},
  url       = {https://mlanthology.org/eccv/2024/zhang2024eccv-omni6dpose/}
}