ZIM: Zero-Shot Image Matting for Anything

ICCV 2025 pp. 23828-23838

Abstract

The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D segmentation. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks. The code is available at https://naver-ai.github.io/ZIM.

Cite

Text

Kim et al. "ZIM: Zero-Shot Image Matting for Anything." International Conference on Computer Vision, 2025.

Markdown

[Kim et al. "ZIM: Zero-Shot Image Matting for Anything." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/kim2025iccv-zim/)

BibTeX

@inproceedings{kim2025iccv-zim,
  title     = {{ZIM: Zero-Shot Image Matting for Anything}},
  author    = {Kim, Beomyoung and Shin, Chanyong and Jeong, Joonhyun and Jung, Hyungsik and Lee, Se-Yun and Chun, Sewhan and Hwang, Dong-Hyun and Yu, Joonsang},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {23828-23838},
  url       = {https://mlanthology.org/iccv/2025/kim2025iccv-zim/}
}