TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
Abstract
While visual-language models have profoundly linked features between texts and images, the incorporation of 3D modality data, such as point clouds and 3D Gaussians, further enables pretraining for 3D-related tasks, e.g., cross-modal retrieval, zero-shot classification, and scene recognition. As challenges remain in extracting 3D modal features and bridging the gap between different modalities, we propose TIGaussian, a framework that harnesses 3D Gaussian Splatting (3DGS) characteristics to strengthen cross-modality alignment through multi-branch 3DGS tokenizer and modality-specific 3D feature alignment strategies. Specifically, our multi-branch 3DGS tokenizer decouples the intrinsic properties of 3DGS structures into compact latent representations, enabling more generalizable feature extraction. To further bridge the modality gap, we develop a bidirectional cross-modal alignment strategies: a multi-view feature fusion mechanism that leverages diffusion priors to resolve perspective ambiguity in image-3D alignment, while a text-3D projection module adaptively maps 3D features to text embedding space for better text-3D alignment. Extensive experiments on various datasets demonstrate the state-of-the-art performance of TIGaussian in multiple tasks. Code repository: https://github.com/RUiN-jiarun/TIGaussian.
Cite
Text
Liu et al. "TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment." International Conference on Learning Representations, 2026.Markdown
[Liu et al. "TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-tigaussian/)BibTeX
@inproceedings{liu2026iclr-tigaussian,
title = {{TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment}},
author = {Liu, Jiarun and Chen, Qifeng and Zhao, Yiru and Liu, Minghua and Ma, Baorui and Yang, Sheng},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/liu2026iclr-tigaussian/}
}