3D Vision and Language Pretraining with Large-Scale Synthetic Data

Yang, Dejie; Xu, Zhu; Mo, Wentao; Chen, Qingchao; Huang, Siyuan; Liu, Yang

doi:10.24963/ijcai.2024/172

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

IJCAI 2024 pp. 1552-1560

doi:10.24963/ijcai.2024/172 /ijcai/2024/yang2024ijcai-d/

Abstract

Two-view correspondence learning is a key task in computer vision, which aims to establish reliable matching relationships for applications such as camera pose estimation and 3D reconstruction. However, existing methods have limitations in local geometric modeling and cross-stage information optimization, which make it difficult to accurately capture the geometric constraints of matched pairs and thus reduce the robustness of the model. To address these challenges, we propose a Multi-Graph Contextual Attention Network (MGCA-Net), which consists of a Contextual Geometric Attention (CGA) module and a Cross-Stage Multi-Graph Consensus (CSMGC) module. Specifically, CGA dynamically integrates spatial position and feature information via an adaptive attention mechanism and enhances the capability to capture both local and global geometric relationships. Meanwhile, CSMGC establishes geometric consensus via a cross-stage sparse graph network, ensuring the consistency of geometric information across different stages. Experimental results on two representative YFCC100M and SUN3D datasets show that MGCA-Net significantly outperforms existing SOTA methods in the outlier rejection and camera pose estimation tasks. Source code is available at http://www.linshuyuan.com.

PDF IJCAI Semantic Scholar

Cite

Text

Yang et al. "3D Vision and Language Pretraining with Large-Scale Synthetic Data." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/172

Markdown

[Yang et al. "3D Vision and Language Pretraining with Large-Scale Synthetic Data." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/yang2024ijcai-d/) doi:10.24963/ijcai.2024/172

BibTeX

@inproceedings{yang2024ijcai-d,
  title     = {{3D Vision and Language Pretraining with Large-Scale Synthetic Data}},
  author    = {Yang, Dejie and Xu, Zhu and Mo, Wentao and Chen, Qingchao and Huang, Siyuan and Liu, Yang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {1552-1560},
  doi       = {10.24963/ijcai.2024/172},
  url       = {https://mlanthology.org/ijcai/2024/yang2024ijcai-d/}
}