Tangent Transformers for Composition,Privacy and Removal

Abstract

We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy. Our code is available at: https://github.com/tianyu139/tangent-model-composition

Cite

Text

Liu et al. "Tangent Transformers for Composition,Privacy and Removal." International Conference on Learning Representations, 2024.

Markdown

[Liu et al. "Tangent Transformers for Composition,Privacy and Removal." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/liu2024iclr-tangent/)

BibTeX

@inproceedings{liu2024iclr-tangent,
  title     = {{Tangent Transformers for Composition,Privacy and Removal}},
  author    = {Liu, Tian Yu and Golatkar, Aditya and Soatto, Stefano},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/liu2024iclr-tangent/}
}