Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Abstract
As Vision Transformers (ViTs) increasingly set new benchmarks in computer vision, their practical deployment on inference engines is often hindered by their significant memory bandwidth and (on-chip) memory footprint requirements. This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs. The key idea is to decompose the weight tensors into a sum of two parameter-efficient matrices while minimizing the error between the product of the input activations with the original weight tensor and the product of the input activations with the approximate tensor sum. Notably, the presented method significantly reduces the parameter count of DeiT-B by 60% with less than 1% accuracy drop on the ImageNet dataset, overcoming the usual accuracy degradation seen in low-rank approximations. In addition to this, the presented compression technique can compress large DeiT/ViT models to have about the same model size as smaller DeiT/ViT variants while yielding up to 1.8% accuracy gain.
Cite
Text
Azizi et al. "Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91979-4_6Markdown
[Azizi et al. "Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/azizi2024eccvw-memoryefficient/) doi:10.1007/978-3-031-91979-4_6BibTeX
@inproceedings{azizi2024eccvw-memoryefficient,
title = {{Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy}},
author = {Azizi, Seyedarmin and Nazemi, Mahdi and Pedram, Massoud},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {55-66},
doi = {10.1007/978-3-031-91979-4_6},
url = {https://mlanthology.org/eccvw/2024/azizi2024eccvw-memoryefficient/}
}