Generalizing Offline Alignment Theoretical Paradigm with Diverse Divergence Constraints

Abstract

The enhanced capabilities of large language models (LLMs) necessitate effective AI alignment. Learning from preference-based feedback has recently become popular as a promising approach to align large language models with human preference. Despite the impressive capabilities demonstrated by these aligned models across various tasks, they lack a unified theoretical framework for expression and deeper theoretical understanding. In this work, we propose the unified theoretical paradigm on human preference-based optimization, known as the Unified Preference Optimization (UPO), which can be proven as the generalization of $\Psi$PO. Through understanding of Unified Preference Optimization (UPO), we can obtain a deeper theoretical comprehension of the practical algorithms, as UPO serves as a generalization for them. Furthermore, we explore a specific scenario of UPO by simply setting the mapping to the Identity. By employing this method, we develop a novel practical algorithm, with the name of Identity Unified Preference Optimization (IUPO). It can be demonstrated that IUPO serves as a generalization of IPO under diverse divergence constraints. Our experiments comparing JS-divergence based IUPO to IPO on the fine-tuning task of GPT2 demonstrate that IUPO, particularly JS-IUPO, outperforms IPO.

Cite

Text

Sun et al. "Generalizing Offline Alignment Theoretical Paradigm with Diverse Divergence Constraints." ICML 2024 Workshops: MFHAIA, 2024.

Markdown

[Sun et al. "Generalizing Offline Alignment Theoretical Paradigm with Diverse Divergence Constraints." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/sun2024icmlw-generalizing/)

BibTeX

@inproceedings{sun2024icmlw-generalizing,
  title     = {{Generalizing Offline Alignment Theoretical Paradigm with Diverse Divergence Constraints}},
  author    = {Sun, Haoyuan and Zheng, Yuxin and Zhao, Yifei and Chang, Yongzhe and Wang, Xueqian},
  booktitle = {ICML 2024 Workshops: MFHAIA},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/sun2024icmlw-generalizing/}
}