Reward Modeling with Ordinal Feedback: Wisdom of the Crowd

Shang Liu, Yu Pan, Guanting Chen, Xiaocheng Li

ICML 2025 pp. 39190-39218

/icml/2025/liu2025icml-reward/

Abstract

The canonical setup of learning a reward model (RM) from human preferences with binary feedback discards potentially useful samples (such as "tied" between the two responses) and loses fine-grained information (such as "slightly better’"). This paper proposes a framework for learning RMs under ordinal feedback, generalizing the binary feedback to arbitrary granularity. We first identify a marginal unbiasedness condition, which generalizes the existing assumption of the binary feedback. The condition is validated via the sociological concept called "wisdom of the crowd". Under this condition, we develop a natural probability model and prove the benefits of fine-grained feedback in terms of reducing the Rademacher complexity, which may be of independent interest to another problem: the bias-variance trade-off in knowledge distillation. The framework also sheds light on designing guidelines for human annotators. Our numerical experiments validate that: (1) fine-grained feedback leads to better RM learning for both in- and out-of-distribution settings; (2) incorporating a certain proportion of tied samples boosts RM learning.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Liu et al. "Reward Modeling with Ordinal Feedback: Wisdom of the Crowd." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Liu et al. "Reward Modeling with Ordinal Feedback: Wisdom of the Crowd." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/liu2025icml-reward/)

BibTeX

@inproceedings{liu2025icml-reward,
  title     = {{Reward Modeling with Ordinal Feedback: Wisdom of the Crowd}},
  author    = {Liu, Shang and Pan, Yu and Chen, Guanting and Li, Xiaocheng},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {39190-39218},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/liu2025icml-reward/}
}