GeoAuxNet: Towards Universal 3D Representation Learning for Multi-Sensor Point Clouds

Abstract

Point clouds captured by different sensors such as RGB-D cameras and LiDAR possess non-negligible domain gaps. Most existing methods design different network architectures and train separately on point clouds from various sensors. Typically point-based methods achieve outstanding performances on even-distributed dense point clouds from RGB-D cameras while voxel-based methods are more efficient for large-range sparse LiDAR point clouds. In this paper we propose geometry-to-voxel auxiliary learning to enable voxel representations to access point-level geometric information which supports better generalisation of the voxel-based backbone with additional interpretations of multi-sensor point clouds. Specifically we construct hierarchical geometry pools generated by a voxel-guided dynamic point network which efficiently provide auxiliary fine-grained geometric information adapted to different stages of voxel features. We conduct experiments on joint multi-sensor datasets to demonstrate the effectiveness of GeoAuxNet. Enjoying elaborate geometric information our method outperforms other models collectively trained on multi-sensor datasets and achieve competitive results with the-state-of-art experts on each single dataset.

Cite

Text

Zhang et al. "GeoAuxNet: Towards Universal 3D Representation Learning for Multi-Sensor Point Clouds." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01892

Markdown

[Zhang et al. "GeoAuxNet: Towards Universal 3D Representation Learning for Multi-Sensor Point Clouds." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-geoauxnet/) doi:10.1109/CVPR52733.2024.01892

BibTeX

@inproceedings{zhang2024cvpr-geoauxnet,
  title     = {{GeoAuxNet: Towards Universal 3D Representation Learning for Multi-Sensor Point Clouds}},
  author    = {Zhang, Shengjun and Fei, Xin and Duan, Yueqi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {20019-20028},
  doi       = {10.1109/CVPR52733.2024.01892},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-geoauxnet/}
}