Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges

Abstract

Data valuation in machine learning (ML) is an emerging research area that studies the worth of data in ML. Data valuation is used in collaborative ML to determine a fair compensation for every data owner and in interpretable ML to identify the most responsible, noisy, or misleading training examples. This paper presents a comprehensive technical survey that provides a new formal study of data valuation in ML through its “ingredients” and the corresponding properties, grounds the discussion of common desiderata satisfied by existing data valuation strategies on our proposed ingredients, and identifies open research challenges for designing new ingredients, data valuation strategies, and cost reduction techniques.

Cite

Text

Sim et al. "Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/782

Markdown

[Sim et al. "Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/sim2022ijcai-data/) doi:10.24963/IJCAI.2022/782

BibTeX

@inproceedings{sim2022ijcai-data,
  title     = {{Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges}},
  author    = {Sim, Rachael Hwee Ling and Xu, Xinyi and Low, Bryan Kian Hsiang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {5607-5614},
  doi       = {10.24963/IJCAI.2022/782},
  url       = {https://mlanthology.org/ijcai/2022/sim2022ijcai-data/}
}