Text2LiDAR: Text-Guided LiDAR Point Clouds Generation via Equirectangular Transformer

Abstract

The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging. Achieving high-quality and controllable LiDAR data generation is urgently needed, controlling with text is a common practice, but there is little research in this field. To this end, we propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generation model. Specifically, we design an equirectangular transformer architecture, utilizing the designed equirectangular attention to capture LiDAR features in a manner with data characteristics. Then, we design a control-signal embedding injector to efficiently integrate control signals through the global-to-focused attention mechanism. Additionally, we devise a frequency modulator to assist the model in recovering high-frequency details, ensuring the clarity of the generated point cloud. To foster development in the field and optimize text-controlled generation performance, we construct nuLiDARtext which offers diverse text descriptors for 34,149 LiDAR point clouds from 850 scenes. Experiments on uncontrolled and text-controlled generation in various forms on KITTI-360 and nuScenes datasets demonstrate the superiority of our approach. The project can be found at https://github.com/wuyang98/Text2LiDAR

Cite

Text

Wu et al. "Text2LiDAR: Text-Guided LiDAR Point Clouds Generation via Equirectangular Transformer." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72992-8_17

Markdown

[Wu et al. "Text2LiDAR: Text-Guided LiDAR Point Clouds Generation via Equirectangular Transformer." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wu2024eccv-text2lidar/) doi:10.1007/978-3-031-72992-8_17

BibTeX

@inproceedings{wu2024eccv-text2lidar,
  title     = {{Text2LiDAR: Text-Guided LiDAR Point Clouds Generation via Equirectangular Transformer}},
  author    = {Wu, Yang and Zhang, Kaihua and Qian, Jianjun and Xie, Jin and Yang, Jian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72992-8_17},
  url       = {https://mlanthology.org/eccv/2024/wu2024eccv-text2lidar/}
}