DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Abstract

Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Image pixels are regular and dense, but LiDAR points are unordered and sparse. To address the problem, we propose a local-to-global fusion network (DVLO) with bi-directional structure alignment. To obtain locally fused features, we project points onto the image plane as cluster centers and cluster image pixels around each center. Image pixels are pre-organized as pseudo points for image-to-point structure alignment. Then, we convert points to pseudo images by cylindrical projection (point-to-image structure alignment) and perform adaptive global feature fusion between point features and local fused features. Our method achieves state-of-the-art performance on KITTI odometry and FlyingThings3D scene flow datasets compared to both single-modal and multi-modal methods. Codes are released at https://github.com/IRMVLab/DVLO.

Cite

Text

Liu et al. "DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72684-2_27

Markdown

[Liu et al. "DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/liu2024eccv-dvlo/) doi:10.1007/978-3-031-72684-2_27

BibTeX

@inproceedings{liu2024eccv-dvlo,
  title     = {{DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment}},
  author    = {Liu, Jiuming and Zhuo, Dong and Feng, Zhiheng and Zhu, Siting and Peng, Chensheng and Liu, Zhe and Wang, Hesheng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72684-2_27},
  url       = {https://mlanthology.org/eccv/2024/liu2024eccv-dvlo/}
}