Unified Category-Level Object Detection and Pose Estimation from RGB Images Using 3D Prototypes

Abstract

Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use separate models and representations for detection and pose estimation. For the first time, we introduce a unified model that integrates detection and pose estimation into a single framework for RGB images by leveraging neural mesh models with learned features and multi-model RANSAC. Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275, improving on the current state-of-the-art by 22.9% averaged across all scale-agnostic metrics. Finally, we demonstrate that our unified method exhibits greater robustness compared to single-stage baselines.

Cite

Text

Fischer et al. "Unified Category-Level Object Detection and Pose Estimation from RGB Images Using 3D Prototypes." International Conference on Computer Vision, 2025.

Markdown

[Fischer et al. "Unified Category-Level Object Detection and Pose Estimation from RGB Images Using 3D Prototypes." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/fischer2025iccv-unified/)

BibTeX

@inproceedings{fischer2025iccv-unified,
  title     = {{Unified Category-Level Object Detection and Pose Estimation from RGB Images Using 3D Prototypes}},
  author    = {Fischer, Tom and Zhang, Xiaojie and Ilg, Eddy},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {9790-9800},
  url       = {https://mlanthology.org/iccv/2025/fischer2025iccv-unified/}
}