Alignment and Generation Adapter for Efficient Video-Text Understanding
Abstract
Pre-trained models have demonstrated considerable performance, especially in enhancing cross-modal understanding between videos and text. However, fine-tuning them at scale becomes costly and poses challenges for adapting to various downstream tasks. To tackle these challenges, we propose the Alignment-generation Adapter (AGAdapter), establishing semantic coherence between alignment and generation models for efficient video-text adaptation across multiple tasks simultaneously. We propose an alignment adapter with knowledge-sharing to adapt the frozen CLIP model for fine-grained video-language interaction. Additionally, we introduce the generation adapter with prompt tuning to leverage the large language model for captioning. Furthermore, we introduce instruction joint tuning, combining textual and cross-modal instructions, to capture detailed descriptions. Our AGAdapter achieves state-of-the-art performance on video-text retrieval and video captioning tasks, including two benchmarks, MSR-VTT and ActivityNet.
Cite
Text
Fang et al. "Alignment and Generation Adapter for Efficient Video-Text Understanding." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00296Markdown
[Fang et al. "Alignment and Generation Adapter for Efficient Video-Text Understanding." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/fang2023iccvw-alignment/) doi:10.1109/ICCVW60793.2023.00296BibTeX
@inproceedings{fang2023iccvw-alignment,
title = {{Alignment and Generation Adapter for Efficient Video-Text Understanding}},
author = {Fang, Han and Yang, Zhifei and Wei, Yuhan and Zang, Xianghao and Ban, Chao and Feng, Zerun and He, Zhongjiang and Li, Yongxiang and Sun, Hao},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2023},
pages = {2783-2789},
doi = {10.1109/ICCVW60793.2023.00296},
url = {https://mlanthology.org/iccvw/2023/fang2023iccvw-alignment/}
}