Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
Abstract
In this paper, we aim to learn a semantic radiance field from multiple scenes that is accurate, efficient and generalizable. While most existing NeRFs target at the tasks of neural scene rendering, image synthesis and multi-view reconstruction, there are a few attempts such as Semantic-NeRF that explore to learn high-level semantic understanding with the NeRF structure. However, Semantic-NeRF simultaneously learns color and semantic label from a single ray with multiple heads, where the single ray fails to provide rich semantic information. As a result, Semantic NeRF relies on positional encoding and needs to train one specific model for each scene. To address this, we propose Semantic Ray (S-Ray) to fully exploit semantic information along the ray direction from its multi-view reprojections. As directly performing dense attention over multi-view reprojected rays would suffer from heavy computational cost, we design a Cross-Reprojection Attention module with consecutive intra-view radial and cross-view sparse attentions, which decomposes contextual information along reprojected rays and cross multiple views and then collects dense connections by stacking the modules. Experiments show that our S-Ray is able to learn from multiple scenes, and it presents strong generalization ability to adapt to unseen scenes.
Cite
Text
Liu et al. "Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01668Markdown
[Liu et al. "Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/liu2023cvpr-semantic/) doi:10.1109/CVPR52729.2023.01668BibTeX
@inproceedings{liu2023cvpr-semantic,
title = {{Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention}},
author = {Liu, Fangfu and Zhang, Chubin and Zheng, Yu and Duan, Yueqi},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {17386-17396},
doi = {10.1109/CVPR52729.2023.01668},
url = {https://mlanthology.org/cvpr/2023/liu2023cvpr-semantic/}
}