Self-Adapting Large Visual-Language Models to Edge Devices Across Visual Modalities
Abstract
Recent advancements in Vision-Language (VL) models have sparked interest in their deployment on edge devices, yet challenges in handling diverse visual modalities, manual annotation, and computational constraints remain. We introduce , a novel framework that bridges this gap by seamlessly integrating dual-modality knowledge distillation and quantization-aware contrastive learning. This approach enables the adaptation of large VL models, like CLIP, for efficient use with both RGB and non-RGB images on resource-limited devices without the need for manual annotations. not only transfers visual language alignment capabilities to compact models but also maintains feature quality post-quantization, significantly enhancing open-vocabulary classification performance across various visual modalities. Our work represents the first systematic effort to adapt large VL models for edge deployment, showcasing up to 15.4% accuracy improvements on multiple datasets and up to 93-fold reduction in model size. Code available at https://github.com/ramdrop/edgevl.
Cite
Text
Cai et al. "Self-Adapting Large Visual-Language Models to Edge Devices Across Visual Modalities." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73390-1_18Markdown
[Cai et al. "Self-Adapting Large Visual-Language Models to Edge Devices Across Visual Modalities." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/cai2024eccv-selfadapting/) doi:10.1007/978-3-031-73390-1_18BibTeX
@inproceedings{cai2024eccv-selfadapting,
title = {{Self-Adapting Large Visual-Language Models to Edge Devices Across Visual Modalities}},
author = {Cai, Kaiwen and Duan, ZheKai and Liu, Gaowen and Fleming, Charles and Lu, Chris Xiaoxuan},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73390-1_18},
url = {https://mlanthology.org/eccv/2024/cai2024eccv-selfadapting/}
}