RePO: Understanding Preference Learning Through ReLU-Based Optimization
Abstract
Preference learning has become a common approach in various recent methods for aligning large language models with human values. These methods optimize the preference margin between chosen and rejected responses, subject to certain constraints for avoiding over-optimization. In this paper, we report surprising empirical findings that simple ReLU activation can learn meaningful alignments even using \emph{none} of the following: (i) sigmoid-based gradient constraints, (ii) explicit regularization terms. Our experiments show that over-optimization does exist, but a threshold parameter $\gamma$ plays an essential role in preventing it by dynamically filtering training examples. We further provide theoretical analysis demonstrating that ReLU-based Preference Optimization (RePO) corresponds to the convex envelope of the 0-1 loss, establishing its fundamental soundness. Our ``RePO'' method achieves competitive or superior results compared to established preference optimization approaches. We hope this simple baseline will motivate researchers to rethink the fundamental mechanisms behind preference optimization for language model alignment.
Cite
Text
Wu et al. "RePO: Understanding Preference Learning Through ReLU-Based Optimization." Advances in Neural Information Processing Systems, 2025.Markdown
[Wu et al. "RePO: Understanding Preference Learning Through ReLU-Based Optimization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wu2025neurips-repo/)BibTeX
@inproceedings{wu2025neurips-repo,
title = {{RePO: Understanding Preference Learning Through ReLU-Based Optimization}},
author = {Wu, Junkang and Huang, Kexin and Wang, Xue and Gao, Jinyang and Ding, Bolin and Wu, Jiancan and He, Xiangnan and Wang, Xiang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/wu2025neurips-repo/}
}