EntitySAM: Segment Everything in Video
Abstract
Automatically tracking and segmenting every video entity remains a significant challenge. Despite rapid advancements in video segmentation, even state-of-the-art models like SAM 2 struggle to consistently track all entities across a video--a task we refer to as Video Entity Segmentation.We propose EntitySAM, a framework for zero-shot video entity segmentation. EntitySAM extends SAM 2 by removing the need for explicit prompts, allowing automatic discovery and tracking of all entities, including those appearing in later frames. We incorporate query-based entity discovery and association into SAM 2, inspired by transformer-based object detectors. Specifically, we introduce an entity decoder to facilitate inter-object communication and an automatic prompt generator using learnable object queries. Additionally, we add a semantic encoder to enhance SAM 2's semantic awareness, improving segmentation quality. Trained on image-level mask annotations without category information from the COCO dataset, EntitySAM demonstrates strong generalization on four zero-shot video segmentation tasks: Video Entity, Panoptic, Instance, and Semantic Segmentation. Results on six popular benchmarks show that EntitySAM outperforms previous unified video segmentation methods and strong baselines, setting new standards for zero-shot video segmentation.
Cite
Text
Ye et al. "EntitySAM: Segment Everything in Video." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02257Markdown
[Ye et al. "EntitySAM: Segment Everything in Video." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/ye2025cvpr-entitysam/) doi:10.1109/CVPR52734.2025.02257BibTeX
@inproceedings{ye2025cvpr-entitysam,
title = {{EntitySAM: Segment Everything in Video}},
author = {Ye, Mingqiao and Oh, Seoung Wug and Ke, Lei and Lee, Joon-Young},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {24234-24243},
doi = {10.1109/CVPR52734.2025.02257},
url = {https://mlanthology.org/cvpr/2025/ye2025cvpr-entitysam/}
}