HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework

Yinuo Ren, Tesi Xiao, Michael Shavlovsky, Lexing Ying, Holakou Rahmanian

NeurIPSW 2024

/neuripsw/2024/ren2024neuripsw-hyperdpo/

Abstract

In LLM alignment and many other ML applications, one often faces the *Multi-Objective Fine-Tuning (MOFT)* problem, *i.e.* fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose the *HyperDPO* framework, a conditioned one-shot fine-tuning approach that extends the Direct Preference Optimization (DPO) technique, originally developed for efficient LLM alignment with preference data, to accommodate the MOFT settings. By substituting the Bradley-Terry-Luce model in DPO with the Plackett-Luce model, our framework is capable of handling a wide range of MOFT tasks that involve listwise ranking datasets. Compared with previous approaches, HyperDPO enjoys an efficient one-shot training process for profiling the Pareto front of auxiliary objectives, and offers post-training control over trade-offs. Additionally, we propose a novel *Hyper Prompt Tuning* design, that conveys continuous importance weight across objectives to transformer-based models without altering their architecture, and investigate the potential of *temperature-conditioned networks* for enhancing the flexibility of post-training control. We demonstrate the effectiveness and efficiency of the HyperDPO framework through its applications to various tasks, including Learning-to-Rank (LTR) and LLM alignment, highlighting its viability for large-scale ML deployments.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Ren et al. "HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework." NeurIPS 2024 Workshops: FITML, 2024.

Markdown

[Ren et al. "HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework." NeurIPS 2024 Workshops: FITML, 2024.](https://mlanthology.org/neuripsw/2024/ren2024neuripsw-hyperdpo/)

BibTeX

@inproceedings{ren2024neuripsw-hyperdpo,
  title     = {{HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework}},
  author    = {Ren, Yinuo and Xiao, Tesi and Shavlovsky, Michael and Ying, Lexing and Rahmanian, Holakou},
  booktitle = {NeurIPS 2024 Workshops: FITML},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/ren2024neuripsw-hyperdpo/}
}