MPAW: Multi-Preference Alignment Through Weak Model Collaboration for Efficient and Flexible LLM Decoding

Chen, Nuo; Xiong, Guojun; He, Bingsheng

MPAW: Multi-Preference Alignment Through Weak Model Collaboration for Efficient and Flexible LLM Decoding

ICLRW 2025

/iclrw/2025/chen2025iclrw-mpaw/

Abstract

Aligning large language models (LLMs) with diverse and competing human preferences remains a critical challenge for safe and effective deployment. While recent work demonstrates that decoding-time alignment via weak preference models achieves strong performance with minimal compute, existing methods optimize for single objectives, severely limiting their adaptability to real-world scenarios requiring multifaceted trade-offs (e.g., safety vs. helpfulness). We propose Multi-Preference Alignment through Weak Model Collaboration (\texttt{MPAW}), a scalable framework that aggregates guidance from heterogeneous weak preference models-smaller LLMs aligned to distinct objectives-into a unified decoding strategy. By dynamically integrating signals from specialized proxies (e.g., safety classifiers, conciseness scorers), \texttt{MPAW} preserves the generalization capabilities of large base models while enabling zero-shot adaptation to arbitrary preference weightings. Empirical results demonstrate reliable alignment quality and nearly matching the performance of computationally expensive multi-objective RLHF fine-tuning. Our findings establish weak model collaboration as a principled pathway for efficient, flexible LLM alignment without retraining.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Chen et al. "MPAW: Multi-Preference Alignment Through Weak Model Collaboration for Efficient and Flexible LLM Decoding." ICLR 2025 Workshops: SSI-FM, 2025.

Markdown

[Chen et al. "MPAW: Multi-Preference Alignment Through Weak Model Collaboration for Efficient and Flexible LLM Decoding." ICLR 2025 Workshops: SSI-FM, 2025.](https://mlanthology.org/iclrw/2025/chen2025iclrw-mpaw/)

BibTeX

@inproceedings{chen2025iclrw-mpaw,
  title     = {{MPAW: Multi-Preference Alignment Through Weak Model Collaboration for Efficient and Flexible LLM Decoding}},
  author    = {Chen, Nuo and Xiong, Guojun and He, Bingsheng},
  booktitle = {ICLR 2025 Workshops: SSI-FM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/chen2025iclrw-mpaw/}
}