A Generalist Framework for Panoptic Segmentation of Images and Videos

Abstract

Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our simple approach can perform competitively to state-of-the-art specialist methods in similar settings.

Cite

Text

Chen et al. "A Generalist Framework for Panoptic Segmentation of Images and Videos." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00090

Markdown

[Chen et al. "A Generalist Framework for Panoptic Segmentation of Images and Videos." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/chen2023iccv-generalist/) doi:10.1109/ICCV51070.2023.00090

BibTeX

@inproceedings{chen2023iccv-generalist,
  title     = {{A Generalist Framework for Panoptic Segmentation of Images and Videos}},
  author    = {Chen, Ting and Li, Lala and Saxena, Saurabh and Hinton, Geoffrey and Fleet, David J.},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {909-919},
  doi       = {10.1109/ICCV51070.2023.00090},
  url       = {https://mlanthology.org/iccv/2023/chen2023iccv-generalist/}
}