Clockwork Convnets for Video Semantic Segmentation
Abstract
Recent years have seen tremendous progress in still-image segmentation; however the naive application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video. We propose a video recognition framework that relies on two key observations: 1) while pixels may change rapidly from frame to frame, the semantic content of a scene evolves more slowly, and 2) execution can be viewed as an aspect of architecture, yielding purpose-fit computation schedules for networks. We define a novel family of "clockwork" convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability. We design a pipeline schedule to reduce latency for real-time recognition and a fixed-rate schedule to reduce overall computation. Finally, we extend clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video. The accuracy and efficiency of clockwork convnets are evaluated on the Youtube-Objects, NYUD, and Cityscapes video datasets.
Cite
Text
Shelhamer et al. "Clockwork Convnets for Video Semantic Segmentation." European Conference on Computer Vision Workshops, 2016. doi:10.1007/978-3-319-49409-8_69Markdown
[Shelhamer et al. "Clockwork Convnets for Video Semantic Segmentation." European Conference on Computer Vision Workshops, 2016.](https://mlanthology.org/eccvw/2016/shelhamer2016eccvw-clockwork/) doi:10.1007/978-3-319-49409-8_69BibTeX
@inproceedings{shelhamer2016eccvw-clockwork,
title = {{Clockwork Convnets for Video Semantic Segmentation}},
author = {Shelhamer, Evan and Rakelly, Kate and Hoffman, Judy and Darrell, Trevor},
booktitle = {European Conference on Computer Vision Workshops},
year = {2016},
pages = {852-868},
doi = {10.1007/978-3-319-49409-8_69},
url = {https://mlanthology.org/eccvw/2016/shelhamer2016eccvw-clockwork/}
}