Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
Abstract
We introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks.
Cite
Text
Jun-Seong et al. "Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01319Markdown
[Jun-Seong et al. "Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/junseong2025cvpr-dr/) doi:10.1109/CVPR52734.2025.01319BibTeX
@inproceedings{junseong2025cvpr-dr,
title = {{Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration}},
author = {Jun-Seong, Kim and Kim, GeonU and Yu-Ji, Kim and Wang, Yu-Chiang Frank and Choe, Jaesung and Oh, Tae-Hyun},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {14137-14146},
doi = {10.1109/CVPR52734.2025.01319},
url = {https://mlanthology.org/cvpr/2025/junseong2025cvpr-dr/}
}