Decoding-Time Language Model Alignment with Multiple Objectives

Abstract

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weightings over different objectives. We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy. Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method. Experiments validate our claims.

Cite

Text

Shi et al. "Decoding-Time Language Model Alignment with Multiple Objectives." ICML 2024 Workshops: TF2M, 2024.

Markdown

[Shi et al. "Decoding-Time Language Model Alignment with Multiple Objectives." ICML 2024 Workshops: TF2M, 2024.](https://mlanthology.org/icmlw/2024/shi2024icmlw-decodingtime/)

BibTeX

@inproceedings{shi2024icmlw-decodingtime,
  title     = {{Decoding-Time Language Model Alignment with Multiple Objectives}},
  author    = {Shi, Ruizhe and Chen, Yifang and Hu, Yushi and Liu, Alisa and Hajishirzi, Hannaneh and Smith, Noah A. and Du, Simon Shaolei},
  booktitle = {ICML 2024 Workshops: TF2M},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/shi2024icmlw-decodingtime/}
}