Valeo4Cast: A Modular Approach to End-to-End Forecasting

Abstract

Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect and track from sensor data (cameras or LiDARs) the past trajectories of the different elements of the scene and predict their future locations. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting, and instead use a modular approach. We individually build and train detection, tracking and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. We conduct an in-depth study on the finetuning strategies and it reveals that our simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 End-to-end Forecasting Challenge, with 63.82 mAP $_\text {f}$ f . We surpass forecasting results by +17.1 points over last year’s winner and by +13.3 points over this year’s runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts.

Cite

Text

Xu et al. "Valeo4Cast: A Modular Approach to End-to-End Forecasting." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91767-7_1

Markdown

[Xu et al. "Valeo4Cast: A Modular Approach to End-to-End Forecasting." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/xu2024eccvw-valeo4cast/) doi:10.1007/978-3-031-91767-7_1

BibTeX

@inproceedings{xu2024eccvw-valeo4cast,
  title     = {{Valeo4Cast: A Modular Approach to End-to-End Forecasting}},
  author    = {Xu, Yihong and Zablocki, Éloi and Boulch, Alexandre and Puy, Gilles and Chen, Mickaël and Bartoccioni, Florent and Samet, Nermin and Siméoni, Oriane and Gidaris, Spyros and Vu, Tuan-Hung and Bursuc, Andrei and Valle, Eduardo and Marlet, Renaud and Cord, Matthieu},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {1-14},
  doi       = {10.1007/978-3-031-91767-7_1},
  url       = {https://mlanthology.org/eccvw/2024/xu2024eccvw-valeo4cast/}
}