Competition over Data: How Does Data Purchase Affect Users?

Abstract

As the competition among machine learning (ML) predictors is widespread in practice, it becomes increasingly important to understand the impact and biases arising from such competition. One critical aspect of ML competition is that ML predictors are constantly updated by acquiring additional data during the competition. Although this active data acquisition can largely affect the overall competition environment, it has not been well-studied before. In this paper, we study what happens when ML predictors can purchase additional data during the competition. We introduce a new environment in which ML predictors use active learning algorithms to effectively acquire labeled data within their budgets while competing against each other. We empirically show that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience---i.e., the accuracy of the predictor selected by each user---can decrease even as the individual predictors get better. We demonstrate that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. With comprehensive experiments, we show that our findings are robust against different modeling assumptions.

Cite

Text

Kwon et al. "Competition over Data: How Does Data Purchase Affect Users?." Transactions on Machine Learning Research, 2022.

Markdown

[Kwon et al. "Competition over Data: How Does Data Purchase Affect Users?." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/kwon2022tmlr-competition/)

BibTeX

@article{kwon2022tmlr-competition,
  title     = {{Competition over Data: How Does Data Purchase Affect Users?}},
  author    = {Kwon, Yongchan and Ginart, Tony A and Zou, James},
  journal   = {Transactions on Machine Learning Research},
  year      = {2022},
  url       = {https://mlanthology.org/tmlr/2022/kwon2022tmlr-competition/}
}