ZeroVO: Visual Odometry with Minimal Assumptions
Abstract
We introduce ZeroVO, a novel visual odometry (VO) algorithm that achieves zero-shot generalization across diverse cameras and environments, overcoming limitations in existing methods that depend on predefined or static camera calibration setups. Our approach incorporates three main innovations. First, we design a calibration-free, geometry-aware network structure capable of handling noise in estimated depth and camera parameters. Second, we introduce a language-based prior that infuses semantic information to enhance robust feature extraction and generalization to previously unseen domains. Third, we develop a flexible, semi-supervised training paradigm that iteratively adapts to new scenes using unlabeled data, further boosting the models' ability to generalize across diverse real-world scenarios. We analyze complex autonomous driving contexts, demonstrating over 30% improvement against prior methods on three standard benchmarks--KITTI, nuScenes, and Argoverse 2--as well as a newly introduced, high-fidelity synthetic dataset derived from Grand Theft Auto (GTA). By not requiring fine-tuning or camera calibration, our work broadens the applicability of VO, providing a versatile solution for real-world deployment at scale.
Cite
Text
Lai et al. "ZeroVO: Visual Odometry with Minimal Assumptions." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01593Markdown
[Lai et al. "ZeroVO: Visual Odometry with Minimal Assumptions." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/lai2025cvpr-zerovo/) doi:10.1109/CVPR52734.2025.01593BibTeX
@inproceedings{lai2025cvpr-zerovo,
title = {{ZeroVO: Visual Odometry with Minimal Assumptions}},
author = {Lai, Lei and Yin, Zekai and Ohn-Bar, Eshed},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {17092-17102},
doi = {10.1109/CVPR52734.2025.01593},
url = {https://mlanthology.org/cvpr/2025/lai2025cvpr-zerovo/}
}