TubeFormer-DeepLab: Video Mask Transformer
Abstract
We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner. Different video segmentation tasks (e.g., video semantic/instance/panoptic segmentation) are usually considered as distinct problems. State-of-the-art models adopted in the separate communities have diverged, and radically different approaches dominate in each task. By contrast, we make a crucial observation that video segmentation tasks could be generally formulated as the problem of assigning different predicted labels to video tubes (where a tube is obtained by linking segmentation masks along the time axis) and the labels may encode different values depending on the target task. The observation motivates us to develop TubeFormer-DeepLab, a simple and effective video mask transformer model that is widely applicable to multiple video segmentation tasks. TubeFormer-DeepLab directly predicts video tubes with task-specific labels (either pure semantic categories, or both semantic categories and instance identities), which not only significantly simplifies video segmentation models, but also advances state-of-the-art results on multiple video segmentation benchmarks.
Cite
Text
Kim et al. "TubeFormer-DeepLab: Video Mask Transformer." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01354Markdown
[Kim et al. "TubeFormer-DeepLab: Video Mask Transformer." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/kim2022cvpr-tubeformerdeeplab/) doi:10.1109/CVPR52688.2022.01354BibTeX
@inproceedings{kim2022cvpr-tubeformerdeeplab,
title = {{TubeFormer-DeepLab: Video Mask Transformer}},
author = {Kim, Dahun and Xie, Jun and Wang, Huiyu and Qiao, Siyuan and Yu, Qihang and Kim, Hong-Seok and Adam, Hartwig and Kweon, In So and Chen, Liang-Chieh},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {13914-13924},
doi = {10.1109/CVPR52688.2022.01354},
url = {https://mlanthology.org/cvpr/2022/kim2022cvpr-tubeformerdeeplab/}
}