Video Inpainting by Jointly Learning Temporal Structure and Spatial Details
Abstract
We present a new data-driven video inpainting method for recovering missing regions of video frames. A novel deep learning architecture is proposed which contains two subnetworks: a temporal structure inference network and a spatial detail recovering network. The temporal structure inference network is built upon a 3D fully convolutional architecture: it only learns to complete a low-resolution video volume given the expensive computational cost of 3D convolution. The low resolution result provides temporal guidance to the spatial detail recovering network, which performs imagebased inpainting with a 2D fully convolutional network to produce recovered video frames in their original resolution. Such two-step network design ensures both the spatial quality of each frame and the temporal coherence across frames. Our method jointly trains both sub-networks in an end-to-end manner. We provide qualitative and quantitative evaluation on three datasets, demonstrating that our method outperforms previous learning-based video inpainting methods.
Cite
Text
Wang et al. "Video Inpainting by Jointly Learning Temporal Structure and Spatial Details." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33015232Markdown
[Wang et al. "Video Inpainting by Jointly Learning Temporal Structure and Spatial Details." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/wang2019aaai-video/) doi:10.1609/AAAI.V33I01.33015232BibTeX
@inproceedings{wang2019aaai-video,
title = {{Video Inpainting by Jointly Learning Temporal Structure and Spatial Details}},
author = {Wang, Chuan and Huang, Haibin and Han, Xiaoguang and Wang, Jue},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2019},
pages = {5232-5239},
doi = {10.1609/AAAI.V33I01.33015232},
url = {https://mlanthology.org/aaai/2019/wang2019aaai-video/}
}