Efficient Video Object Segmentation via Network Modulation

Abstract

Video object segmentation targets segmenting a specific object throughout a video sequence when given only an annotated first frame. Recent deep learning based approaches find it effective to fine-tune a general-purpose segmentation model on the annotated frame using hundreds of iterations of gradient descent. Despite the high accuracy that these methods achieve, the fine-tuning process is inefficient and fails to meet the requirements of real world applications. We propose a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object. Specifically, a second meta neural network named modulator is trained to manipulate the intermediate layers of the segmentation network given limited visual and spatial information of the target object. The experiments show that our approach is 70 times faster than fine-tuning approaches and achieves similar accuracy.

Cite

Text

Yang et al. "Efficient Video Object Segmentation via Network Modulation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00680

Markdown

[Yang et al. "Efficient Video Object Segmentation via Network Modulation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/yang2018cvpr-efficient/) doi:10.1109/CVPR.2018.00680

BibTeX

@inproceedings{yang2018cvpr-efficient,
  title     = {{Efficient Video Object Segmentation via Network Modulation}},
  author    = {Yang, Linjie and Wang, Yanran and Xiong, Xuehan and Yang, Jianchao and Katsaggelos, Aggelos K.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2018},
  doi       = {10.1109/CVPR.2018.00680},
  url       = {https://mlanthology.org/cvpr/2018/yang2018cvpr-efficient/}
}