Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
Abstract
Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified *squeezing effect* (also known as *likelihood displacement*), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate *logits-SAM*, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT across multiple datasets and benchmarks demonstrate that logits-SAM consistently improves the effectiveness of DPO and integrates seamlessly with other DPO variants. Code is available at <https://github.com/RitianLuo/logits-sam-dpo>.
Cite
Text
Luo et al. "Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization." International Conference on Learning Representations, 2026.Markdown
[Luo et al. "Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/luo2026iclr-sharpnessaware/)BibTeX
@inproceedings{luo2026iclr-sharpnessaware,
title = {{Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization}},
author = {Luo, Haocheng and Deng, Zehang and Do, Thanh-Toan and Harandi, Mehrtash and Phung, Dinh and Le, Trung},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/luo2026iclr-sharpnessaware/}
}