Learning the 3D Fauna of the Web

Abstract

Learning 3D models of all animals in nature requires massively scaling up existing solutions. With this ultimate goal in mind we develop 3D-Fauna an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data which we overcome by learning our model from 2D Internet images. We show that prior approaches which are category-specific fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM) which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model we also contribute a new large-scale dataset of diverse animal species. At inference time given a single image of any quadruped animal our model reconstructs an articulated 3D mesh in a feed-forward manner in seconds.

Cite

Text

Li et al. "Learning the 3D Fauna of the Web." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00931

Markdown

[Li et al. "Learning the 3D Fauna of the Web." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/li2024cvpr-learning-b/) doi:10.1109/CVPR52733.2024.00931

BibTeX

@inproceedings{li2024cvpr-learning-b,
  title     = {{Learning the 3D Fauna of the Web}},
  author    = {Li, Zizhang and Litvak, Dor and Li, Ruining and Zhang, Yunzhi and Jakab, Tomas and Rupprecht, Christian and Wu, Shangzhe and Vedaldi, Andrea and Wu, Jiajun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {9752-9762},
  doi       = {10.1109/CVPR52733.2024.00931},
  url       = {https://mlanthology.org/cvpr/2024/li2024cvpr-learning-b/}
}