SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
Abstract
The integration of Vision-Language Models (VLMs) into autonomous driving systems has shown promise in addressing key challenges such as learning complexity, interpretability, and common-sense reasoning. However, existing approaches often struggle with efficient integration and real-time decision-making due to computational demands. In this paper, we introduce SOLVE, an innovative framework that synergizes VLMs with end-to-end (E2E) models to enhance autonomous vehicle planning. Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components. We propose a Trajectory Chain-of-Thought (T-CoT) paradigm, which progressively refines trajectory predictions, reducing uncertainty and improving accuracy. By employing a temporal decoupling strategy, SOLVE achieves efficient asynchronous cooperation, aligning high-quality VLM outputs with E2E real-time performance. Evaluated on the nuScenes dataset, our method demonstrates significant improvements in trajectory prediction accuracy, paving the way for more robust and reliable autonomous driving systems.
Cite
Text
Chen et al. "SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01127Markdown
[Chen et al. "SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/chen2025cvpr-solve/) doi:10.1109/CVPR52734.2025.01127BibTeX
@inproceedings{chen2025cvpr-solve,
title = {{SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving}},
author = {Chen, Xuesong and Huang, Linjiang and Ma, Tao and Fang, Rongyao and Shi, Shaoshuai and Li, Hongsheng},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {12068-12077},
doi = {10.1109/CVPR52734.2025.01127},
url = {https://mlanthology.org/cvpr/2025/chen2025cvpr-solve/}
}