MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Yupeng Chen, Senmiao Wang, Yushun Zhang, Zhihang Lin, Haozhe Zhang, Weijian Sun, Tian Ding, Ruoyu Sun

TMLR 2025

/tmlr/2025/chen2025tmlr-mofo/

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. Typically, LLMs are first pre-trained on large corpora and subsequently fine-tuned on task-specific datasets. However, during fine-tuning, LLMs may forget some knowledge acquired in the pre-training stage, leading to a decline in general capabilities. Existing approaches to mitigate forgetting often rely on access to pre-training data, which may be unavailable in many real-world scenarios—such as fine-tuning checkpoint-only open-source LLMs. To address this challenge, we propose a new fine-tuning algorithm termed Momentum-Filtered Optimizer (MoFO). MoFO is an extension of greedy block coordinate descent (BCD) methods: in each iteration, MoFO only updates the model parameters with the largest momentum magnitudes, while keeping all other parameters fixed. MoFO achieves similar fine-tuning performance to the default fine-tuning algorithm while effectively mitigating knowledge forgetting. We validate MoFO through rigorous convergence analysis and extensive experiments, demonstrating its effectiveness in mitigating forgetting without pre-training data.

PDF TMLR Code Semantic Scholar

Cite

Text

Chen et al. "MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning." Transactions on Machine Learning Research, 2025.

Markdown

[Chen et al. "MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/chen2025tmlr-mofo/)

BibTeX

@article{chen2025tmlr-mofo,
  title     = {{MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning}},
  author    = {Chen, Yupeng and Wang, Senmiao and Zhang, Yushun and Lin, Zhihang and Zhang, Haozhe and Sun, Weijian and Ding, Tian and Sun, Ruoyu},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/chen2025tmlr-mofo/}
}