Learning Dynamics of LLM Finetuning
Abstract
Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique ``squeezing effect'' to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM's finetuning but also inspires a simple, effective method to improve alignment performance.
Cite
Text
Ren and Sutherland. "Learning Dynamics of LLM Finetuning." International Conference on Learning Representations, 2025.Markdown
[Ren and Sutherland. "Learning Dynamics of LLM Finetuning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/ren2025iclr-learning/)BibTeX
@inproceedings{ren2025iclr-learning,
title = {{Learning Dynamics of LLM Finetuning}},
author = {Ren, Yi and Sutherland, Danica J.},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/ren2025iclr-learning/}
}