From Fairness to Truthfulness: Rethinking Data Valuation Design

Abstract

As large language models increasingly rely on external data sources, fairly com- pensating data contributors has become a central concern. In this paper, we revisit the design of data markets through a game-theoretic lens, where data owners face private, heterogeneous costs for data sharing. We show that commonly used valu- ation methods—such as Leave-One-Out and Data Shapley—fail to ensure truthful reporting of these costs, leading to inefficient market outcomes. To address this, we adapt well-established payment rules from mechanism design, namely Myer- son and Vickrey-Clarke-Groves (VCG), to the data market setting. We demon- strate that the Myerson payment is the minimal truthful payment mechanism, op- timal from the buyer’s perspective, and that VCG and Myerson payments coincide in unconstrained allocation settings. Our findings highlight the importance of in- corporating incentive compatibility into data valuation, paving the way for more robust and efficient data markets.

Cite

Text

Fan et al. "From Fairness to Truthfulness: Rethinking Data Valuation Design." ICLR 2025 Workshops: Data_Problems, 2025.

Markdown

[Fan et al. "From Fairness to Truthfulness: Rethinking Data Valuation Design." ICLR 2025 Workshops: Data_Problems, 2025.](https://mlanthology.org/iclrw/2025/fan2025iclrw-fairness/)

BibTeX

@inproceedings{fan2025iclrw-fairness,
  title     = {{From Fairness to Truthfulness: Rethinking Data Valuation Design}},
  author    = {Fan, Dongyang and Rotello, Tyler J. and Karimireddy, Sai Praneeth},
  booktitle = {ICLR 2025 Workshops: Data_Problems},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/fan2025iclrw-fairness/}
}