Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals

Abstract

Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving. Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model’s ability to learn accurate and realistic motion. In this paper, we introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data to obtain supervision signals. We design three innovative supervision signals to preserve the inherent properties of scene motion, including the masked Chamfer distance loss, the piecewise rigidity loss, and the temporal consistency loss. Through extensive experiments, we demonstrate that our proposed self-supervised framework outperforms all previous self-supervision methods for the motion prediction task.

Cite

Text

Fang et al. "Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I2.27940

Markdown

[Fang et al. "Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/fang2024aaai-self/) doi:10.1609/AAAI.V38I2.27940

BibTeX

@inproceedings{fang2024aaai-self,
  title     = {{Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals}},
  author    = {Fang, Shaoheng and Liu, Zuhong and Wang, Mingyu and Xu, Chenxin and Zhong, Yiqi and Chen, Siheng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {1726-1734},
  doi       = {10.1609/AAAI.V38I2.27940},
  url       = {https://mlanthology.org/aaai/2024/fang2024aaai-self/}
}