GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

Abstract

Modeling the 3D world from sensor data for simulation is a scalable way of developing testing and validation environments for robotic learning problems such as autonomous driving. However, manually creating or re-creating real-world-like environments is difficult, expensive, and not scalable. Recent generative model techniques have shown promising progress to address such challenges by learning 3D assets using only plentiful 2D images -- but still suffer limitations as they leverage either human-curated image datasets or renderings from manually-created synthetic 3D environments. In this paper, we introduce GINA-3D, a generative model that uses real-world driving data from camera and LiDAR sensors to create photo-realistic 3D implicit neural assets of diverse vehicles and pedestrians. Compared to the existing image datasets, the real-world driving setting poses new challenges due to occlusions, lighting-variations and long-tail distributions. GINA-3D tackles these challenges by decoupling representation learning and generative modeling into two stages with a learned tri-plane latent structure, inspired by recent advances in generative modeling of images. To evaluate our approach, we construct a large-scale object-centric dataset containing over 520K images of vehicles and pedestrians from the Waymo Open Dataset, and a new set of 80K images of long-tail instances such as construction equipment, garbage trucks, and cable cars. We compare our model with existing approaches and demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.

Cite

Text

Shen et al. "GINA-3D: Learning to Generate Implicit Neural Assets in the Wild." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00476

Markdown

[Shen et al. "GINA-3D: Learning to Generate Implicit Neural Assets in the Wild." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/shen2023cvpr-gina3d/) doi:10.1109/CVPR52729.2023.00476

BibTeX

@inproceedings{shen2023cvpr-gina3d,
  title     = {{GINA-3D: Learning to Generate Implicit Neural Assets in the Wild}},
  author    = {Shen, Bokui and Yan, Xinchen and Qi, Charles R. and Najibi, Mahyar and Deng, Boyang and Guibas, Leonidas and Zhou, Yin and Anguelov, Dragomir},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {4913-4926},
  doi       = {10.1109/CVPR52729.2023.00476},
  url       = {https://mlanthology.org/cvpr/2023/shen2023cvpr-gina3d/}
}