PTQ4SAM: Post-Training Quantization for Segment Anything

Abstract

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However as a large-scale model the immense memory and computation costs hinder its practical deployment. In this paper we propose a post-training quantization (PTQ) framework for Segment Anything Model namely PTQ4SAM. First we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in \cls post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives and propose a Bimodal Integration strategy which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second SAM encompasses diverse attention mechanisms (i.e. self-attention and two-way cross-attention) resulting in substantial variations in the post-Softmax distributions. Therefore we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base which is hardware-friendly. Extensive experimental results across various vision tasks (instance segmentation semantic segmentation and object detection) datasets and model variants show the superiority of PTQ4SAM. For example when quantizing SAM-L to 6-bit we achieve lossless accuracy for instance segmentation about 0.5% drop with theoretical 3.9xacceleration. The code is available at https://github.com/chengtao-lv/PTQ4SAM.

Cite

Text

Lv et al. "PTQ4SAM: Post-Training Quantization for Segment Anything." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01509

Markdown

[Lv et al. "PTQ4SAM: Post-Training Quantization for Segment Anything." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/lv2024cvpr-ptq4sam/) doi:10.1109/CVPR52733.2024.01509

BibTeX

@inproceedings{lv2024cvpr-ptq4sam,
  title     = {{PTQ4SAM: Post-Training Quantization for Segment Anything}},
  author    = {Lv, Chengtao and Chen, Hong and Guo, Jinyang and Ding, Yifu and Liu, Xianglong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {15941-15951},
  doi       = {10.1109/CVPR52733.2024.01509},
  url       = {https://mlanthology.org/cvpr/2024/lv2024cvpr-ptq4sam/}
}