AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation

Abstract

Text-to-image diffusion models have shown remarkable success in synthesizing photo-realistic images. Apart from creative applications, can we use such models to synthesize samples that aid the few-shot training of discriminative models? In this work, we propose AlignDiff, a general framework for synthesizing training images and masks for few-shot segmentation. We identify two crucial misalignments that arise when utilizing pre-trained diffusion models in segmentation tasks, which need to be addressed to create realistic training samples and align the synthetic data distribution with the real training distribution: 1) instance-level misalignment, where generated samples of rare categories are often misaligned with target tasks) and 2) annotation-level misalignment, where diffusion models are limited to generating images without pixel-level annotations. AlignDiff overcomes both challenges by leveraging a few real samples to guide the generation, thus improving novel IoU over baseline methods in few-shot segmentation and generalized few-shot segmentation on Pascal-5i and COCO-20i by up to 80%. Notably, AlignDiff is capable of augmenting the learning of out-of-distribution uncommon categories on FSS-1000, while naı̈ve diffusion model generates samples that diminish segmentation performance.

Cite

Text

Qiu et al. "AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72940-9_22

Markdown

[Qiu et al. "AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/qiu2024eccv-aligndiff/) doi:10.1007/978-3-031-72940-9_22

BibTeX

@inproceedings{qiu2024eccv-aligndiff,
  title     = {{AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation}},
  author    = {Qiu, Ri-Zhao and Wang, Yu-Xiong and Hauser, Kris},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72940-9_22},
  url       = {https://mlanthology.org/eccv/2024/qiu2024eccv-aligndiff/}
}