Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization

Abstract

Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified *squeezing effect* (also known as *likelihood displacement*), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate *logits-SAM*, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT across multiple datasets and benchmarks demonstrate that logits-SAM consistently improves the effectiveness of DPO and integrates seamlessly with other DPO variants. Code is available at <https://github.com/RitianLuo/logits-sam-dpo>.

Cite

Text

Luo et al. "Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization." International Conference on Learning Representations, 2026.

Markdown

[Luo et al. "Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/luo2026iclr-sharpnessaware/)

BibTeX

@inproceedings{luo2026iclr-sharpnessaware,
  title     = {{Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization}},
  author    = {Luo, Haocheng and Deng, Zehang and Do, Thanh-Toan and Harandi, Mehrtash and Phung, Dinh and Le, Trung},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/luo2026iclr-sharpnessaware/}
}