Towards Realistic Scene Generation with LiDAR Diffusion Models
Abstract
Diffusion models (DMs) excel in photo-realistic image synthesis but their adaptation to LiDAR scene generation poses a substantial hurdle. This is primarily because DMs operating in the point space struggle to preserve the curve-like patterns and 3D geometry of LiDAR scenes which consumes much of their representation power. In this paper we propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes by incorporating geometric priors into the learning pipeline. Our method targets three major desiderata: pattern realism geometry realism and object realism. Specifically we introduce curve-wise compression to simulate real-world LiDAR patterns point-wise coordinate supervision to learn scene geometry and patch-wise encoding for a full 3D object context. With these three core designs our method achieves competitive performance on unconditional LiDAR generation in 64-beam scenario and state of the art on conditional LiDAR generation while maintaining high efficiency compared to point-based DMs (up to 107xfaster). Furthermore by compressing LiDAR scenes into a latent space we enable the controllability of DMs with various conditions such as semantic maps camera views and text prompts. Our code and pretrained weights are available at https://github.com/hancyran/LiDAR-Diffusion.
Cite
Text
Ran et al. "Towards Realistic Scene Generation with LiDAR Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01396Markdown
[Ran et al. "Towards Realistic Scene Generation with LiDAR Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/ran2024cvpr-realistic/) doi:10.1109/CVPR52733.2024.01396BibTeX
@inproceedings{ran2024cvpr-realistic,
title = {{Towards Realistic Scene Generation with LiDAR Diffusion Models}},
author = {Ran, Haoxi and Guizilini, Vitor and Wang, Yue},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {14738-14748},
doi = {10.1109/CVPR52733.2024.01396},
url = {https://mlanthology.org/cvpr/2024/ran2024cvpr-realistic/}
}