Data Attribution: A Data-Centric Approach for Trustworthy AI Development

Abstract

Data plays an increasingly crucial role in both the performance and the safety of AI models. Data attribution is an emerging family of techniques aimed at quantifying the impact of individual training data points on a model trained on them, which has found data-centric applications such as instance-based explanation, unsafe training data detection, and copyright compensation. In this talk, I will comprehensively review our work contributing to the applications, methods, and open-source benchmarks of data attribution, and discuss open challenges in this field.

Cite

Text

Ma. "Data Attribution: A Data-Centric Approach for Trustworthy AI Development." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I27.35114

Markdown

[Ma. "Data Attribution: A Data-Centric Approach for Trustworthy AI Development." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/ma2025aaai-data/) doi:10.1609/AAAI.V39I27.35114

BibTeX

@inproceedings{ma2025aaai-data,
  title     = {{Data Attribution: A Data-Centric Approach for Trustworthy AI Development}},
  author    = {Ma, Jiaqi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {28720},
  doi       = {10.1609/AAAI.V39I27.35114},
  url       = {https://mlanthology.org/aaai/2025/ma2025aaai-data/}
}