Aligning to Thousands of Preferences via System Message Generalization

Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo

NeurIPSW 2024

/neuripsw/2024/lee2024neuripsw-aligning/

Abstract

Current large language model (LLM) alignment methods often assume that aligning LLMs with general public preferences is optimal, overlooking individual value diversity. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves re-training new models for new value or user. We propose a new paradigm where users specify their values within the system message, steering LLM behavior to align with individual intentions. However, LLMs are typically trained on a generic system message (e.g., "You are a helpful assistant"). To improve generalization to diverse system messages, we create a system message dataset with 197k value combinations across 66k user instructions. We train a 7B LLM, Janus and test it on five benchmarks, adding various unseen system messages reflecting user preferences. Janus achieves high tie+win rates against leading models, including GPT-4. Janus also outperforms LLaMA 3 8B Instruct on general helpfulness benchmarks, suggesting that training with diverse system messages enhances alignment with both individual and general preferences. Code, dataset, benchmark, and models are available at https://github.com/kaistAI/Janus.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Lee et al. "Aligning to Thousands of Preferences via System Message Generalization." NeurIPS 2024 Workshops: Pluralistic-Alignment, 2024.

Markdown

[Lee et al. "Aligning to Thousands of Preferences via System Message Generalization." NeurIPS 2024 Workshops: Pluralistic-Alignment, 2024.](https://mlanthology.org/neuripsw/2024/lee2024neuripsw-aligning/)

BibTeX

@inproceedings{lee2024neuripsw-aligning,
  title     = {{Aligning to Thousands of Preferences via System Message Generalization}},
  author    = {Lee, Seongyun and Park, Sue Hyun and Kim, Seungone and Seo, Minjoon},
  booktitle = {NeurIPS 2024 Workshops: Pluralistic-Alignment},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/lee2024neuripsw-aligning/}
}