conSAMme: Achieving Consistent Segmentations with SAM
Abstract
Multi-output interactive segmentation methods generate multiple binary masks when given user guidance, such as clicks. However, it is unpredictable whether the order of the masks will match or whether those masks will be the same when given slightly different user guidance. To address these issues, we propose conSAMme, a contrastive learning framework that conditions on explicit hierarchical semantics and leverages weakly supervised part segmentation data and a novel episodic click sampling strategy. Evaluation of conSAMme's performance, click robustness, and mask ordering show substantial improvements to baselines with less than 1% extra training data compared to the amount of data used for the baseline.
Cite
Text
Myers-Dean et al. "conSAMme: Achieving Consistent Segmentations with SAM." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.Markdown
[Myers-Dean et al. "conSAMme: Achieving Consistent Segmentations with SAM." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/myersdean2025cvprw-consamme/)BibTeX
@inproceedings{myersdean2025cvprw-consamme,
title = {{conSAMme: Achieving Consistent Segmentations with SAM}},
author = {Myers-Dean, Josh and Liu, Kangning and Price, Brian L. and Fan, Yifei and Kuen, Jason and Gurari, Danna},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2025},
pages = {759-768},
url = {https://mlanthology.org/cvprw/2025/myersdean2025cvprw-consamme/}
}