Generic 3D Representation via Pose Estimation and Matching

Abstract

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross modality pose estimation).

Cite

Text

Zamir et al. "Generic 3D Representation via Pose Estimation and Matching." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46487-9_33

Markdown

[Zamir et al. "Generic 3D Representation via Pose Estimation and Matching." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/zamir2016eccv-generic/) doi:10.1007/978-3-319-46487-9_33

BibTeX

@inproceedings{zamir2016eccv-generic,
  title     = {{Generic 3D Representation via Pose Estimation and Matching}},
  author    = {Zamir, Amir R. and Wekel, Tilman and Agrawal, Pulkit and Wei, Colin and Malik, Jitendra and Savarese, Silvio},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {535-553},
  doi       = {10.1007/978-3-319-46487-9_33},
  url       = {https://mlanthology.org/eccv/2016/zamir2016eccv-generic/}
}