End2End Multi-View Feature Matching with Differentiable Pose Optimization
Abstract
Erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional, time-costly measures, like RANSAC, for outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly. To this end, we propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. Training feature matching with gradients from pose optimization naturally learns to down-weight outliers and boosts pose estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same time, it reduces the pose estimation time by over 50% and renders RANSAC iterations unnecessary. Moreover, we integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at once. Multi-view matching combined with end-to-end training improves the pose estimation metrics on Matterport3D by 18.5% compared to SuperGlue.
Cite
Text
Roessle and Nießner. "End2End Multi-View Feature Matching with Differentiable Pose Optimization." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00050Markdown
[Roessle and Nießner. "End2End Multi-View Feature Matching with Differentiable Pose Optimization." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/roessle2023iccv-end2end/) doi:10.1109/ICCV51070.2023.00050BibTeX
@inproceedings{roessle2023iccv-end2end,
title = {{End2End Multi-View Feature Matching with Differentiable Pose Optimization}},
author = {Roessle, Barbara and Nießner, Matthias},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {477-487},
doi = {10.1109/ICCV51070.2023.00050},
url = {https://mlanthology.org/iccv/2023/roessle2023iccv-end2end/}
}