Deep Video Deblurring: The Devil Is in the Details

Abstract

Video deblurring for hand-held cameras is a challenging task, since the underlying blur is caused by both camera shake and object motion. State-of-the-art deep networks exploit temporal information from neighboring frames, either by means of spatio-temporal transformers or by recurrent architectures. In contrast to these involved models, we found that a simple baseline CNN can perform astonishingly well when particular care is taken w.r.t. the details of model and training procedure. To that end, we conduct a comprehensive study regarding these crucial details, uncovering extreme differences in quantitative and qualitative performance. Exploiting these details allows us to boost the architecture and training procedure of a simple baseline CNN by a staggering 3.15dB, such that it becomes highly competitive w.r.t. cutting-edge networks. This raises the question whether the reported accuracy difference between models is always due to technical contributions or also subject to such orthogonal, but crucial details.

Cite

Text

Gast and Roth. "Deep Video Deblurring: The Devil Is in the Details." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00475

Markdown

[Gast and Roth. "Deep Video Deblurring: The Devil Is in the Details." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/gast2019iccvw-deep/) doi:10.1109/ICCVW.2019.00475

BibTeX

@inproceedings{gast2019iccvw-deep,
  title     = {{Deep Video Deblurring: The Devil Is in the Details}},
  author    = {Gast, Jochen and Roth, Stefan},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {3824-3833},
  doi       = {10.1109/ICCVW.2019.00475},
  url       = {https://mlanthology.org/iccvw/2019/gast2019iccvw-deep/}
}