Mapping Social Choice Theory to RLHF

Abstract

Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory’s analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, and identify differences between them that prevent well-known technical results in social choice from immediately applying to RLHF. We then redefine canonical desiderata from social choice theory for the RLHF context and discuss how they may serve as analytical tools for open problems in RLHF. Finally, we contextualize the role of social choice in the broader political theory literature on democracy and collective decision making.

Cite

Text

Dai and Fleisig. "Mapping Social Choice Theory to RLHF." ICLR 2024 Workshops: R2-FM, 2024.

Markdown

[Dai and Fleisig. "Mapping Social Choice Theory to RLHF." ICLR 2024 Workshops: R2-FM, 2024.](https://mlanthology.org/iclrw/2024/dai2024iclrw-mapping/)

BibTeX

@inproceedings{dai2024iclrw-mapping,
  title     = {{Mapping Social Choice Theory to RLHF}},
  author    = {Dai, Jessica and Fleisig, Eve},
  booktitle = {ICLR 2024 Workshops: R2-FM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/dai2024iclrw-mapping/}
}