TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Abstract

Past work has studied the effects of fine-tuning on large language models’ (LLMs) overall performance on certain tasks. However, a way to quantitatively and systematically analyze its effect on individual outputs is still lacking. In this work, we propose a new method for measuring the contribution that fine-tuning makes to individual LLM responses, assuming access to the original pre-trained model. Our method takes into account the model’s intermediate hidden states, giving a more fine-grained insight into the effects of fine-tuning than a simple comparison of the final outputs of pre-trained and fine-tuned models. We introduce and theoretically analyze an exact decomposition of any fine-tuned LLM into a pre-training component and a fine-tuning component. Empirically, we find that one can steer model behavior and performance by up- or down-scaling the fine-tuning component during the forward pass. Motivated by this finding and our theoretical analysis, we define the Tuning Contribution ($\mathrm{TuCo}$) in terms of the ratio of the magnitudes fine-tuning component and the pre-training component. We find that three prominent adversarial attacks on LLMs circumvent safety measures in a way that reduces the Tuning Contribution, and that $\mathrm{TuCo}$ is consistently lower on prompts where the attacks succeed compared to ones where they do not. This suggests that attenuating the effect of fine-tuning on model outputs plays a role in the success of these attacks. In short, $\mathrm{TuCo}$ enables the quantitative study of how fine-tuning influences model behavior and safety, and vice-versa.

Cite

Text

Nuti et al. "TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Nuti et al. "TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/nuti2025icml-tuco/)

BibTeX

@inproceedings{nuti2025icml-tuco,
  title     = {{TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs}},
  author    = {Nuti, Felipe Pinto Coelho and Franzmeyer, Tim and Henriques, Joao F.},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {46837-46876},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/nuti2025icml-tuco/}
}