Monocular Dynamic View Synthesis: A Reality Check
Abstract
We study the recent progress on dynamic view synthesis (DVS) from monocular video. Though existing approaches have demonstrated impressive results, we show a discrepancy between the practical capture process and the existing experimental protocols, which effectively leaks in multi-view signals during training. We define effective multi-view factors (EMFs) to quantify the amount of multi-view signal present in the input capture sequence based on the relative camera-scene motion. We introduce two new metrics: co-visibility masked image metrics and correspondence accuracy, which overcome the issue in existing protocols. We also propose a new iPhone dataset that includes more diverse real-life deformation sequences. Using our proposed experimental protocol, we show that the state-of-the-art approaches observe a 1-2 dB drop in masked PSNR in the absence of multi-view cues and 4-5 dB drop when modeling complex motion. Code and data can be found at http://hangg7.com/dycheck.
Cite
Text
Gao et al. "Monocular Dynamic View Synthesis: A Reality Check." Neural Information Processing Systems, 2022.Markdown
[Gao et al. "Monocular Dynamic View Synthesis: A Reality Check." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/gao2022neurips-monocular/)BibTeX
@inproceedings{gao2022neurips-monocular,
title = {{Monocular Dynamic View Synthesis: A Reality Check}},
author = {Gao, Hang and Li, Ruilong and Tulsiani, Shubham and Russell, Bryan and Kanazawa, Angjoo},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/gao2022neurips-monocular/}
}