Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

Abstract

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

Cite

Text

Mei et al. "Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I1.32041

Markdown

[Mei et al. "Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/mei2025aaai-defense/) doi:10.1609/AAAI.V39I1.32041

BibTeX

@inproceedings{mei2025aaai-defense,
  title     = {{Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy}},
  author    = {Mei, Jian-Ping and Zhang, Weibin and Chen, Jie and Zhang, Xuyun and Zhu, Tiantian},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {604-611},
  doi       = {10.1609/AAAI.V39I1.32041},
  url       = {https://mlanthology.org/aaai/2025/mei2025aaai-defense/}
}