Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method

Abstract

We study learning human preferences from limited comparison feedback, a core machine learning problem that is at the center of reinforcement learning from human feedback (RLHF). We formulate the problem as learning a Plackett-Luce (PL) model from a limited number of $K$-subset comparisons over a universe of $N$ items, where typically $K \ll N$. Our objective is to select the $K$-subsets such that all items can be ranked with minimal mistakes within the budget. We solve the problem using the D-optimal design, which minimizes the worst-case ranking loss under the estimated PL model. All known algorithms for this problem are computationally infeasible in our setting because we consider exponentially many subsets in $K$. To address this challenge, we propose a randomized Frank-Wolfe algorithm with memoization and sparse updates that has a low $O(N^2 + K^2)$ per-iteration complexity. We analyze it and demonstrate its empirical superiority on synthetic and open-source NLP datasets.

Cite

Text

Thekumparampil et al. "Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Thekumparampil et al. "Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/thekumparampil2025icml-comparing/)

BibTeX

@inproceedings{thekumparampil2025icml-comparing,
  title     = {{Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method}},
  author    = {Thekumparampil, Kiran Koshy and Hiranandani, Gaurush and Kalantari, Kousha and Sabach, Shoham and Kveton, Branislav},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {59355-59376},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/thekumparampil2025icml-comparing/}
}