Unveiling the Dynamics of Transfer Learning Representations
Abstract
Representation similarity analysis is used to analyze the dynamics of neural networks. When used to measure the importance of layers in fine-tuning, it is revealed that there is less representation change in early layers than in later layers, which supports freezing early layers during fine-tuning. In this paper, we want to discuss how we can interpret these similarity scores of the representations. We argue that the scalar value of similarity scores between representations of trained and untrained networks should not be interpreted directly. In addition, similarity values obtained by comparing learned representations to their initialized representation should not be compared across layers to judge their importance. Instead, the similarity scores should be proportioned to similar problems to be assessed appropriately. This can be done by a controlled randomization of the dataset, which covers the spectrum from original to random. We find out that the representation change depends on the size of the training data, its structure, and - if pre-trained - how close it is to the pre-trained task. If a dataset does not have a meaningful hierarchical structure, smaller networks tend to \textit{unlearn} the knowledge of the pre-trained network. In contrast, larger networks still use their learned capabilities.
Cite
Text
Goerttler and Obermayer. "Unveiling the Dynamics of Transfer Learning Representations." ICLR 2024 Workshops: Re-Align, 2024.Markdown
[Goerttler and Obermayer. "Unveiling the Dynamics of Transfer Learning Representations." ICLR 2024 Workshops: Re-Align, 2024.](https://mlanthology.org/iclrw/2024/goerttler2024iclrw-unveiling/)BibTeX
@inproceedings{goerttler2024iclrw-unveiling,
title = {{Unveiling the Dynamics of Transfer Learning Representations}},
author = {Goerttler, Thomas and Obermayer, Klaus},
booktitle = {ICLR 2024 Workshops: Re-Align},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/goerttler2024iclrw-unveiling/}
}