From Fairness to Truthfulness: Rethinking Data Valuation Design
Abstract
As large language models increasingly rely on external data sources, fairly com- pensating data contributors has become a central concern. In this paper, we revisit the design of data markets through a game-theoretic lens, where data owners face private, heterogeneous costs for data sharing. We show that commonly used valu- ation methods—such as Leave-One-Out and Data Shapley—fail to ensure truthful reporting of these costs, leading to inefficient market outcomes. To address this, we adapt well-established payment rules from mechanism design, namely Myer- son and Vickrey-Clarke-Groves (VCG), to the data market setting. We demon- strate that the Myerson payment is the minimal truthful payment mechanism, op- timal from the buyer’s perspective, and that VCG and Myerson payments coincide in unconstrained allocation settings. Our findings highlight the importance of in- corporating incentive compatibility into data valuation, paving the way for more robust and efficient data markets.
Cite
Text
Fan et al. "From Fairness to Truthfulness: Rethinking Data Valuation Design." ICLR 2025 Workshops: Data_Problems, 2025.Markdown
[Fan et al. "From Fairness to Truthfulness: Rethinking Data Valuation Design." ICLR 2025 Workshops: Data_Problems, 2025.](https://mlanthology.org/iclrw/2025/fan2025iclrw-fairness/)BibTeX
@inproceedings{fan2025iclrw-fairness,
title = {{From Fairness to Truthfulness: Rethinking Data Valuation Design}},
author = {Fan, Dongyang and Rotello, Tyler J. and Karimireddy, Sai Praneeth},
booktitle = {ICLR 2025 Workshops: Data_Problems},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/fan2025iclrw-fairness/}
}