Mechanism Design for LLM Fine-Tuning with Multiple Reward Models
Abstract
Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, which could affect the fine-tuned outcomes. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. We primarily focus on a representative class of training rules that maximize social welfare subject to certain regularizations, referred to as \tr\ rules. First, we show that under most circumstances, truthful reporting is sub-optimal with simply a training rule, thereby highlighting the necessity of payments. Second, we design affine maximizer payment rules that implement \tr\ rules in dominant-strategy incentive compatibility (DSIC). Further, we characterize sufficient conditions for payment equivalence properties. For a training rule that satisfies these conditions, we have found all the payment rules that implement it in DSIC, as they only differ by a constant term irrelevant to agents' reports from each other.
Cite
Text
Sun et al. "Mechanism Design for LLM Fine-Tuning with Multiple Reward Models." NeurIPS 2024 Workshops: Pluralistic-Alignment, 2024.Markdown
[Sun et al. "Mechanism Design for LLM Fine-Tuning with Multiple Reward Models." NeurIPS 2024 Workshops: Pluralistic-Alignment, 2024.](https://mlanthology.org/neuripsw/2024/sun2024neuripsw-mechanism/)BibTeX
@inproceedings{sun2024neuripsw-mechanism,
title = {{Mechanism Design for LLM Fine-Tuning with Multiple Reward Models}},
author = {Sun, Haoran and Chen, Yurong and Wang, Siwei and Chen, Wei and Deng, Xiaotie},
booktitle = {NeurIPS 2024 Workshops: Pluralistic-Alignment},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/sun2024neuripsw-mechanism/}
}