HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework
Abstract
In LLM alignment and many other ML applications, one often faces the *Multi-Objective Fine-Tuning (MOFT)* problem, *i.e.* fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose the *HyperDPO* framework, a conditioned one-shot fine-tuning approach that extends the Direct Preference Optimization (DPO) technique, originally developed for efficient LLM alignment with preference data, to accommodate the MOFT settings. By substituting the Bradley-Terry-Luce model in DPO with the Plackett-Luce model, our framework is capable of handling a wide range of MOFT tasks that involve listwise ranking datasets. Compared with previous approaches, HyperDPO enjoys an efficient one-shot training process for profiling the Pareto front of auxiliary objectives, and offers post-training control over trade-offs. Additionally, we propose a novel *Hyper Prompt Tuning* design, that conveys continuous importance weight across objectives to transformer-based models without altering their architecture, and investigate the potential of *temperature-conditioned networks* for enhancing the flexibility of post-training control. We demonstrate the effectiveness and efficiency of the HyperDPO framework through its applications to various tasks, including Learning-to-Rank (LTR) and LLM alignment, highlighting its viability for large-scale ML deployments.
Cite
Text
Ren et al. "HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework." NeurIPS 2024 Workshops: FITML, 2024.Markdown
[Ren et al. "HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework." NeurIPS 2024 Workshops: FITML, 2024.](https://mlanthology.org/neuripsw/2024/ren2024neuripsw-hyperdpo/)BibTeX
@inproceedings{ren2024neuripsw-hyperdpo,
title = {{HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework}},
author = {Ren, Yinuo and Xiao, Tesi and Shavlovsky, Michael and Ying, Lexing and Rahmanian, Holakou},
booktitle = {NeurIPS 2024 Workshops: FITML},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/ren2024neuripsw-hyperdpo/}
}