Mixed Neural Voxels for Fast Multi-View Video Synthesis

Abstract

Synthesizing high-fidelity videos from real-world multiview input is challenging due to the complexities of real-world environments and high-dynamic movements. Previous works based on neural radiance fields have demonstrated high-quality reconstructions of dynamic scenes. However, training such models on real-world scenes is time-consuming, usually taking days or weeks. In this paper, we present a novel method named MixVoxels to efficiently represent dynamic scenes which leads to fast training and rendering speed. The proposed MixVoxels represents the 4D dynamic scenes as a mixture of static and dynamic voxels and processes them with different networks. In this way, the computation of the required modalities for static voxels can be processed by a lightweight model, which essentially reduces the amount of computation as many daily dynamic scenes are dominated by static backgrounds. To distinguish the two kinds of voxels, we propose a novel variation field to estimate the temporal variance of each voxel. For the dynamic representations, we design an inner-product time query method to efficiently query multiple time steps, which is essential to recover the high-dynamic movements. As a result, with 15 minutes of training for dynamic scenes with inputs of 300-frame videos, MixVoxels achieves better PSNR than previous methods. For rendering, MixVoxels can render a novel view video with 1K resolution at 37 fps. Codes and trained models are available at https://github.com/fengres/mixvoxels.

Cite

Text

Wang et al. "Mixed Neural Voxels for Fast Multi-View Video Synthesis." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01805

Markdown

[Wang et al. "Mixed Neural Voxels for Fast Multi-View Video Synthesis." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/wang2023iccv-mixed/) doi:10.1109/ICCV51070.2023.01805

BibTeX

@inproceedings{wang2023iccv-mixed,
  title     = {{Mixed Neural Voxels for Fast Multi-View Video Synthesis}},
  author    = {Wang, Feng and Tan, Sinan and Li, Xinghang and Tian, Zeyue and Song, Yafei and Liu, Huaping},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {19706-19716},
  doi       = {10.1109/ICCV51070.2023.01805},
  url       = {https://mlanthology.org/iccv/2023/wang2023iccv-mixed/}
}