Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Abstract

Fine-tuning significantly improves the performance of Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis, a popular tool in Mechanistic Interpretability (MI). Unlike previous studies (Prakash et al. 2024, Chhabra et al. 2024) that focus on tasks where pre-trained models already perform well, we develop a set of mathematical tasks where fine-tuning yields substantial performance gains, bringing the setup closer to real-world scenarios. In our experiments, we identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities. First, we find that while circuits maintain high node similarity before and after fine-tuning, their edges undergo significant changes, contrasting with previous work (Prakash et al. 2024, Chhabra et al. 2024) that reported only small circuit additions after fine-tuning. Based on these observations, we develop a circuit-aware Low-Rank Adaptation (LoRA) method that assigns ranks to layers according to edge changes in the circuits. Experimental results demonstrate that our circuit-based LoRA achieves an average improvement of 2.46% over standard LoRA with comparable parameter sizes. Furthermore, we explore how combining circuits from subtasks can enhance fine-tuning in compositional tasks, offering new insights into task design and deepening our understanding of circuit dynamics and fine-tuning mechanisms.

Cite

Text

Wang et al. "Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Wang et al. "Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wang2025icml-understanding/)

BibTeX

@inproceedings{wang2025icml-understanding,
  title     = {{Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis}},
  author    = {Wang, Xu and Hu, Yan and Du, Wenyu and Cheng, Reynold and Wang, Benyou and Zou, Difan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {63088-63112},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/wang2025icml-understanding/}
}