Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges
Abstract
Data valuation in machine learning (ML) is an emerging research area that studies the worth of data in ML. Data valuation is used in collaborative ML to determine a fair compensation for every data owner and in interpretable ML to identify the most responsible, noisy, or misleading training examples. This paper presents a comprehensive technical survey that provides a new formal study of data valuation in ML through its “ingredients” and the corresponding properties, grounds the discussion of common desiderata satisfied by existing data valuation strategies on our proposed ingredients, and identifies open research challenges for designing new ingredients, data valuation strategies, and cost reduction techniques.
Cite
Text
Sim et al. "Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/782Markdown
[Sim et al. "Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/sim2022ijcai-data/) doi:10.24963/IJCAI.2022/782BibTeX
@inproceedings{sim2022ijcai-data,
title = {{Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges}},
author = {Sim, Rachael Hwee Ling and Xu, Xinyi and Low, Bryan Kian Hsiang},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {5607-5614},
doi = {10.24963/IJCAI.2022/782},
url = {https://mlanthology.org/ijcai/2022/sim2022ijcai-data/}
}