Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses

Abstract

Stereo depth estimation is a fundamental component in augmented reality (AR), which requires low latency for real-time processing. However, preprocessing such as rectification and non-ML computations such as cost volume require significant amount of latency exceeding that of an ML model itself, which hinders the real-time processing required by AR. Therefore, we develop alternative approaches to the rectification and cost volume that consider ML acceleration (GPU and NPUs) in recent hardware. For pre-processing, we eliminate it by introducing homography matrix prediction network with a rectification positional encoding (RPE), which delivers both low latency and robustness to unrectified images. For cost volume, we replace it with a group-pointwise convolution-based operator and approximation of cosine similarity based on layernorm and dot product. Based on our approaches, we develop MultiHeadDepth (replacing cost volume) and HomoDepth (MultiHeadDepth + removing pre-processing) models. MultiHeadDepth provides 11.8-30.3% improvements in accuracy and 22.9-25.2% reduction in latency compared to a state-of-the-art depth estimation model for AR glasses from industry. HomoDepth, which can directly process unrectified images, reduces the end-to-end latency by 44.5%. We also introduce a multi-task learning method to handle misaligned stereo inputs on HomoDepth, which reduces the AbsRel error by 10.0-24.3%. The overall results demonstrate the efficacy of our approaches, which not only reduce the inference latency but also improve the model performance. Our code is available at https://github.com/UCI-ISA-Lab/MultiHeadDepth-HomoDepth

Cite

Text

Liu and Kwon. "Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00586

Markdown

[Liu and Kwon. "Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/liu2025cvpr-efficient-a/) doi:10.1109/CVPR52734.2025.00586

BibTeX

@inproceedings{liu2025cvpr-efficient-a,
  title     = {{Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses}},
  author    = {Liu, Yongfan and Kwon, Hyoukjun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {6252-6261},
  doi       = {10.1109/CVPR52734.2025.00586},
  url       = {https://mlanthology.org/cvpr/2025/liu2025cvpr-efficient-a/}
}