Margin-Aware Preference Optimization for Aligning Diffusion Models Without Reference

Abstract

Preference alignment methods (such as DPO) typically rely on divergence regularization for stability but struggle with reference mismatch when preference data deviates from the reference model. In this paper, we identify the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models. Motivated by this analysis, we propose a reference-agnostic alignment of T2I diffusion models, coined margin-aware preference optimization (MaPO). By freeing the reference model, MaPO enables a new way to address diverse T2I downstream tasks, with varying levels of reference mismatch.. We validate this with five representative T2I tasks: (1) preference alignment, (2) cultural representation, (3) safe generation, (4) style learning, and (5) personalization. MaPO surpasses Diffusion DPO as the level of reference mismatch starts to increase while also being superior to task-specific methods like DreamBooth. Additionally, MaPO enjoys being more efficient in both training time and memory without compromising quality.

Cite

Text

Hong et al. "Margin-Aware Preference Optimization for Aligning Diffusion Models Without Reference." ICLR 2025 Workshops: SCOPE, 2025.

Markdown

[Hong et al. "Margin-Aware Preference Optimization for Aligning Diffusion Models Without Reference." ICLR 2025 Workshops: SCOPE, 2025.](https://mlanthology.org/iclrw/2025/hong2025iclrw-marginaware/)

BibTeX

@inproceedings{hong2025iclrw-marginaware,
  title     = {{Margin-Aware Preference Optimization for Aligning Diffusion Models Without Reference}},
  author    = {Hong, Jiwoo and Paul, Sayak and Lee, Noah and Rasul, Kashif and Thorne, James and Jeong, Jongheon},
  booktitle = {ICLR 2025 Workshops: SCOPE},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/hong2025iclrw-marginaware/}
}