Conditional Transformer Fine-Tuning by Adaptive Layer Skipping
Abstract
In recent years, deep learning have achieved significant success across various domains, such as natural language processing and computer vision. Despite their advancement, most of the deep neural networks assign uniform computation costs to all inputs regardless of their complexity. Focusing on Transformer architecture, our study addresses this challenge by introducing a sequence-level conditional fine-tuning framework through adaptive layer skipping. The proposed framework dynamically adjusts the computation based on the complexity of input sequence and is tailored for modern accelerators like TPU/GPUs. We examined several measurements on input complexity and found one to be very effective on guiding the conditional computation. The experiment results on synthetic and real-world datasets demonstrate the effectiveness of our methodology by achieving a substantial reduction in training time while maintaining the same predictive performance.
Cite
Text
Zhang et al. "Conditional Transformer Fine-Tuning by Adaptive Layer Skipping." ICLR 2024 Workshops: PML4LRS, 2024.Markdown
[Zhang et al. "Conditional Transformer Fine-Tuning by Adaptive Layer Skipping." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/zhang2024iclrw-conditional/)BibTeX
@inproceedings{zhang2024iclrw-conditional,
title = {{Conditional Transformer Fine-Tuning by Adaptive Layer Skipping}},
author = {Zhang, Xingjian and Tang, Jiaxi and Liu, Yang and Yi, Xinyang and Wei, Li and Hong, Lichan and Mei, Qiaozhu and Chi, Ed H.},
booktitle = {ICLR 2024 Workshops: PML4LRS},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/zhang2024iclrw-conditional/}
}