Q-MiniSAM2: A Quantization-Based Benchmark for Resource-Efficient Video Segmentation

Abstract

Segment Anything Model 2 (SAM2) is a new-generation, high-precision model for image and video segmentation, offering extensive application prospects across numerous computer vision fields. However, as a large-scale model, its huge memory demands and expansive computing costs pose challenges for practical deployment. This paper presents Q-MiniSAM2, an efficient Quantization-based segmentation benchmark tailored to optimize SAM2 by Minimizing memory consumption and accelerating computations. We begin with applying Post-Training Quantization (PTQ) to SAM2, requiring only a relatively small dataset for network calibration, thereby eliminating the need for retraining. Building upon PTQ, we further introduce a Hierarchy-based Video Quantization method to enhance the model’s capacity to capture video semantics and temporal correlations across different time scales. Furthermore, we observe that SAM2’s memory overhead is predominantly concentrated on processing historical frames, and the redundant cross-attention computations significantly increase memory and computational costs due to the imperceptible change of the short time intervals between these frames. To tackle this issue, an Adaptive Mutual-KV mechanism is proposed to mitigate excessive cross-attention by leveraging inter-frame similarities. Comprehensive experiments demonstrate that the proposed approach achieves superior performance compared to state-of-the-art methods, underscoring its potential for efficient and scalable video segmentation.

Cite

Text

Ren et al. "Q-MiniSAM2: A Quantization-Based Benchmark for Resource-Efficient Video Segmentation." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/204

Markdown

[Ren et al. "Q-MiniSAM2: A Quantization-Based Benchmark for Resource-Efficient Video Segmentation." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/ren2025ijcai-q/) doi:10.24963/IJCAI.2025/204

BibTeX

@inproceedings{ren2025ijcai-q,
  title     = {{Q-MiniSAM2: A Quantization-Based Benchmark for Resource-Efficient Video Segmentation}},
  author    = {Ren, Xuanxuan and Li, Xiangyu and Wei, Kun and Yang, Xu and Yang, Yanhua},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1829-1837},
  doi       = {10.24963/IJCAI.2025/204},
  url       = {https://mlanthology.org/ijcai/2025/ren2025ijcai-q/}
}