WST: Wavelet-Based Multi-Scale Tuning for Visual Transfer Learning
Abstract
Large-scale pre-trained Vision Transformer (ViT) models have demonstrated remarkable performance on visual tasks but are computationally expensive to transfer to downstream tasks. Parameter-Efficient Fine-Tuning (PEFT) offers a promising transferring approach by updating only a subset of parameters. However, PEFT's effectiveness is hindered by discrepancies between pre-training and downstream tasks in terms of object scale and granularity. Downstream tasks often focus on finer-grained and more specialized recognition, requiring more detailed features. The diversity of feature scales of existing PEFT methods for ViT is limited. To address this, we propose a novel PEFT method named Wavelet-based multi-Scale Tuning (WST), which learns multi-scale features in a simple and efficient way. WST introduces a parallel fine-tuning patch embedding branch with a smaller patch size than the pre-trained model to capture finer-grained features. Furthermore, to handle the computational challenge from the resulting longer token sequence, WST designs wavelet fine-tuning blocks that balance both efficiency and performance. In the block, wavelet transform enables invertible and lossless down-sampling of the longer token sequence, aligning it with that of the backbone, and two lightweight linear mappings are employed to learn task-specific features. This design facilitates efficient multi-scale information exchange between the pre-trained backbone and fine-tuning branch. Extensive experiments on transfer learning demonstrate the promising performance and efficiency of our WST.
Cite
Text
Zeng et al. "WST: Wavelet-Based Multi-Scale Tuning for Visual Transfer Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34387Markdown
[Zeng et al. "WST: Wavelet-Based Multi-Scale Tuning for Visual Transfer Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zeng2025aaai-wst/) doi:10.1609/AAAI.V39I21.34387BibTeX
@inproceedings{zeng2025aaai-wst,
title = {{WST: Wavelet-Based Multi-Scale Tuning for Visual Transfer Learning}},
author = {Zeng, Jia and Huang, Lan and Wang, Kangping},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {22317-22325},
doi = {10.1609/AAAI.V39I21.34387},
url = {https://mlanthology.org/aaai/2025/zeng2025aaai-wst/}
}