DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

Abstract

Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for 960 x 1920 videos.

Cite

Text

Zhao et al. "DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00202

Markdown

[Zhao et al. "DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhao2023cvpr-dnerv/) doi:10.1109/CVPR52729.2023.00202

BibTeX

@inproceedings{zhao2023cvpr-dnerv,
  title     = {{DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos}},
  author    = {Zhao, Qi and Asif, M. Salman and Ma, Zhan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {2031-2040},
  doi       = {10.1109/CVPR52729.2023.00202},
  url       = {https://mlanthology.org/cvpr/2023/zhao2023cvpr-dnerv/}
}