GenAssets: Generating In-the-Wild 3D Assets in Latent Space

Abstract

High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but existing neural-rendering based reconstruction methods are slow and generate assets that render well only from viewpoints close to the original observations, limiting their usefulness in simulation. Recent diffusion-based generative models build complete and diverse assets, but perform poorly on in-the-wild driving scenes, where observed actors are captured under sparse and limited fields of view, and are partially occluded. In this work, we propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a "reconstruct-then-generate" approach that first leverages occlusion-aware neural rendering trained over multiple scenes to build a high-quality latent space for objects, and then trains a diffusion model that operates on the latent space. We show our method outperforms existing reconstruction and generation based methods, unlocking diverse and scalable content creation for simulation. Please visit https://waabi.ai/genassets for more details.

Cite

Text

Yang et al. "GenAssets: Generating In-the-Wild 3D Assets in Latent Space." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02086

Markdown

[Yang et al. "GenAssets: Generating In-the-Wild 3D Assets in Latent Space." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/yang2025cvpr-genassets/) doi:10.1109/CVPR52734.2025.02086

BibTeX

@inproceedings{yang2025cvpr-genassets,
  title     = {{GenAssets: Generating In-the-Wild 3D Assets in Latent Space}},
  author    = {Yang, Ze and Wang, Jingkang and Zhang, Haowei and Manivasagam, Sivabalan and Chen, Yun and Urtasun, Raquel},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {22392-22403},
  doi       = {10.1109/CVPR52734.2025.02086},
  url       = {https://mlanthology.org/cvpr/2025/yang2025cvpr-genassets/}
}