Connecting NeRFs, Images, and Text
Abstract
Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities, similar to established methodologies for images and text. To this end, we propose a simple framework that exploits pre-trained models for NeRF representations alongside multimodal models for text and image processing. Our framework learns a bidirectional mapping between NeRF embeddings and those obtained from corresponding images and text. This mapping unlocks several novel and useful applications, including NeRF zero-shot classification and NeRF retrieval from images or text.
Cite
Text
Ballerini et al. "Connecting NeRFs, Images, and Text." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00092Markdown
[Ballerini et al. "Connecting NeRFs, Images, and Text." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/ballerini2024cvprw-connecting/) doi:10.1109/CVPRW63382.2024.00092BibTeX
@inproceedings{ballerini2024cvprw-connecting,
title = {{Connecting NeRFs, Images, and Text}},
author = {Ballerini, Francesco and Ramirez, Pierluigi Zama and Mirabella, Roberto and Salti, Samuele and Di Stefano, Luigi},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {866-876},
doi = {10.1109/CVPRW63382.2024.00092},
url = {https://mlanthology.org/cvprw/2024/ballerini2024cvprw-connecting/}
}