Enhancing NeRF Akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Abstract

Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward inference pipeline. While those feedforward "neuralized" architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a recent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to enhance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respectively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes, indicating remarkably better cross-scene generalization in both zero-shot and few-shot settings. Our codes are available at https://github.com/VITA-Group/GNT-MOVE.

Cite

Text

Cong et al. "Enhancing NeRF Akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00296

Markdown

[Cong et al. "Enhancing NeRF Akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/cong2023iccv-enhancing/) doi:10.1109/ICCV51070.2023.00296

BibTeX

@inproceedings{cong2023iccv-enhancing,
  title     = {{Enhancing NeRF Akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts}},
  author    = {Cong, Wenyan and Liang, Hanxue and Wang, Peihao and Fan, Zhiwen and Chen, Tianlong and Varma, Mukund and Wang, Yi and Wang, Zhangyang},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {3193-3204},
  doi       = {10.1109/ICCV51070.2023.00296},
  url       = {https://mlanthology.org/iccv/2023/cong2023iccv-enhancing/}
}