DAVINZ: Data Valuation Using Deep Neural Networks at Initialization
Abstract
Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.
Cite
Text
Wu et al. "DAVINZ: Data Valuation Using Deep Neural Networks at Initialization." International Conference on Machine Learning, 2022.Markdown
[Wu et al. "DAVINZ: Data Valuation Using Deep Neural Networks at Initialization." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/wu2022icml-davinz/)BibTeX
@inproceedings{wu2022icml-davinz,
title = {{DAVINZ: Data Valuation Using Deep Neural Networks at Initialization}},
author = {Wu, Zhaoxuan and Shu, Yao and Low, Bryan Kian Hsiang},
booktitle = {International Conference on Machine Learning},
year = {2022},
pages = {24150-24176},
volume = {162},
url = {https://mlanthology.org/icml/2022/wu2022icml-davinz/}
}