CoPL: Collaborative Preference Learning for Personalizing LLMs

Abstract

Personalizing large language models (LLMs) is important for aligning outputs with diverse user preferences, yet existing methods struggle with flexibility and generalization. We propose CoPL (Collaborative Preference Learning), a graph-based collaborative filtering framework that models user-response relationships to enhance preference estimation, particularly in sparse annotation settings. By integrating mixture of LoRA experts (MoLE), CoPL efficiently fine-tunes LLMs while dynamically balancing shared and user-specific preferences. Additionally, an optimization-free adaptation strategy enables generalization to unseen users without fine-tuning. Experiments on UltraFeedback-P demonstrate that CoPL outperforms existing personalized reward models, effectively capturing both common and controversial preferences, making it a scalable solution for personalized LLM alignment.

Cite

Text

Choi et al. "CoPL: Collaborative Preference Learning for Personalizing LLMs." ICLR 2025 Workshops: Bi-Align, 2025.

Markdown

[Choi et al. "CoPL: Collaborative Preference Learning for Personalizing LLMs." ICLR 2025 Workshops: Bi-Align, 2025.](https://mlanthology.org/iclrw/2025/choi2025iclrw-copl/)

BibTeX

@inproceedings{choi2025iclrw-copl,
  title     = {{CoPL: Collaborative Preference Learning for Personalizing LLMs}},
  author    = {Choi, Youngbin and Cho, Seunghyuk and Lee, Minjong and Park, MoonJeong and Ko, Yesong and Ok, Jungseul and Kim, Dongwoo},
  booktitle = {ICLR 2025 Workshops: Bi-Align},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/choi2025iclrw-copl/}
}