Interpretation Attacks and Defenses on Predictive Models Using Electronic Health Records
Abstract
The emergence of complex deep neural networks made it crucial to employ interpretation methods for gaining insight into the rationale behind model predictions. However, recent studies have revealed attacks on these interpretations, which aim to deceive users and subvert the trustworthiness of the models. It is especially critical in medical systems, where interpretations are essential in explaining outcomes. This paper presents the first interpretation attack on predictive models using sequential electronic health records (EHRs). Prior attempts in image interpretation mainly utilized gradient-based methods, yet our research shows that our attack can attain significant success on EHR interpretations that do not rely on model gradients. We introduce metrics compatible with EHR data to evaluate the attack’s success. Moreover, our findings demonstrate that detection methods that have successfully identified conventional adversarial examples are ineffective against our attack. We then propose a defense method utilizing auto-encoders to de-noise the data and improve the interpretations’ robustness. Our results indicate that this de-noising method outperforms the widely used defense method, SmoothGrad, which is based on adding noise to the data.
Cite
Text
Razmi et al. "Interpretation Attacks and Defenses on Predictive Models Using Electronic Health Records." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43418-1_27Markdown
[Razmi et al. "Interpretation Attacks and Defenses on Predictive Models Using Electronic Health Records." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/razmi2023ecmlpkdd-interpretation/) doi:10.1007/978-3-031-43418-1_27BibTeX
@inproceedings{razmi2023ecmlpkdd-interpretation,
title = {{Interpretation Attacks and Defenses on Predictive Models Using Electronic Health Records}},
author = {Razmi, Fereshteh and Lou, Jian and Hong, Yuan and Xiong, Li},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2023},
pages = {446-461},
doi = {10.1007/978-3-031-43418-1_27},
url = {https://mlanthology.org/ecmlpkdd/2023/razmi2023ecmlpkdd-interpretation/}
}