Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

Abstract

Sequential decision-making can be formulated as a conditional generation process, with targets for alignment with human intents and versatility across various tasks. Previous return-conditioned diffusion models manifest comparable performance but rely on well-defined reward functions, which requires amounts of human efforts and faces challenges in multi-task settings. Preferences serve as an alternative but recent work rarely considers preference learning given multiple tasks. To facilitate the alignment and versatility in multi-task preference learning, we adopt multi-task preferences as a unified framework. In this work, we propose to learn preference representations aligned with preference labels, which are then used as conditions to guide the conditional generation process of diffusion models. The traditional classifier-free guidance paradigm suffers from the inconsistency between the conditions and generated trajectories. We thus introduce an auxiliary regularization objective to maximize the mutual info

Cite

Text

Yu et al. "Regularized Conditional Diffusion Model for Multi-Task Preference Alignment." Neural Information Processing Systems, 2024. doi:10.52202/079017-4442

Markdown

[Yu et al. "Regularized Conditional Diffusion Model for Multi-Task Preference Alignment." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yu2024neurips-regularized/) doi:10.52202/079017-4442

BibTeX

@inproceedings{yu2024neurips-regularized,
  title     = {{Regularized Conditional Diffusion Model for Multi-Task Preference Alignment}},
  author    = {Yu, Xudong and Bai, Chenjia and He, Haoran and Wang, Changhong and Li, Xuelong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4442},
  url       = {https://mlanthology.org/neurips/2024/yu2024neurips-regularized/}
}