Geometric-Averaged Preference Optimization for Soft Preference Labels

Abstract

Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic.However, human preferences can vary across individuals, and therefore should be represented distributionally.In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.This approach adjusts the scale of learning loss based on the soft labels such that the loss would approach zero when the responses are closer to equally preferred.This simple modification can be easily applied to any DPO-based methods and mitigate over-optimization and objective mismatch, which prior works suffer from.Our experiments simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe more preferable responses than binary labels and significant improvements where modestly-confident labels are in the majority.

Cite

Text

Furuta et al. "Geometric-Averaged Preference Optimization for Soft Preference Labels." Neural Information Processing Systems, 2024. doi:10.52202/079017-1819

Markdown

[Furuta et al. "Geometric-Averaged Preference Optimization for Soft Preference Labels." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/furuta2024neurips-geometricaveraged/) doi:10.52202/079017-1819

BibTeX

@inproceedings{furuta2024neurips-geometricaveraged,
  title     = {{Geometric-Averaged Preference Optimization for Soft Preference Labels}},
  author    = {Furuta, Hiroki and Lee, Kuang-Huei and Gu, Shixiang Shane and Matsuo, Yutaka and Faust, Aleksandra and Zen, Heiga and Gur, Izzeddin},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1819},
  url       = {https://mlanthology.org/neurips/2024/furuta2024neurips-geometricaveraged/}
}