C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning
Abstract
3D Gaussian splatting (3DGS) has demonstrated exceptional performance in image-based 3D reconstruction and real-time rendering. However, regions with complex textures require numerous Gaussians to capture significant color variations accurately, leading to inefficiencies in rendering speed. To address this challenge, we introduce a hybrid representation for indoor scenes that combines 3DGS with textured meshes. Our approach uses textured meshes to handle texture-rich flat areas, while retaining Gaussians to model intricate geometries. The proposed method begins by pruning and refining the extracted mesh to eliminate geometrically complex regions. We then employ a joint optimization for 3DGS and mesh, incorporating a warm-up strategy and transmittance-aware supervision to balance their contributions seamlessly. Extensive experiments demonstrate that the hybrid representation maintains comparable rendering quality and achieves superior frames per second FPS with fewer Gaussian primitives.
Cite
Text
Ma et al. "C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/128Markdown
[Ma et al. "C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/ma2024ijcai-c/) doi:10.24963/ijcai.2024/128BibTeX
@inproceedings{ma2024ijcai-c,
title = {{C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning}},
author = {Ma, Ji and Suo, Wei and Wang, Peng and Zhang, Yanning},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {1155-1163},
doi = {10.24963/ijcai.2024/128},
url = {https://mlanthology.org/ijcai/2024/ma2024ijcai-c/}
}