To the Point: Correspondence-Driven Monocular 3D Category Reconstruction

Abstract

We present To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences given only foreground masks, a category specific template and optionally sparse keypoints for supervision. We recover a 3D shape from a 2D image by first regressing the 2D positions corresponding to the 3D template vertices and then jointly estimating a rigid camera transform and non-rigid template deformation that optimally explain the 2D positions through the 3D shape projection. By relying on correspondences we use a simple per-sample optimization problem to replace CNN-based regression of camera pose and non-rigid deformation and thereby obtain substantially more accurate 3D reconstructions. We treat this optimization as a differentiable layer and train the whole system in an end-to-end manner using geometry-driven losses. We report systematic quantitative improvements on multiple categories and provide qualitative results comprising diverse shape, poses and texture prediction examples.

Cite

Text

Kokkinos and Kokkinos. "To the Point: Correspondence-Driven Monocular 3D Category Reconstruction." Neural Information Processing Systems, 2021.

Markdown

[Kokkinos and Kokkinos. "To the Point: Correspondence-Driven Monocular 3D Category Reconstruction." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/kokkinos2021neurips-point/)

BibTeX

@inproceedings{kokkinos2021neurips-point,
  title     = {{To the Point: Correspondence-Driven Monocular 3D Category Reconstruction}},
  author    = {Kokkinos, Filippos and Kokkinos, Iasonas},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/kokkinos2021neurips-point/}
}