Detecting Generative Model Inversion Attacks for Protecting Intellectual Property of Deep Neural Networks

Yiding Yu, Wei Zong, Wenjing Su, Yang-Wai Chow, Willy Susilo

JAIR 2025

doi:10.1613/JAIR.1.19468 /jair/2025/yu2025jair-detecting/

Abstract

Recently, protecting the Intellectual Property (IP) of deep neural networks (DNNs) has attracted attention from researchers. This is because training DNN models can be costly especially when acquiring and labeling training data require domain expertise. DNN watermarking and fingerprinting are two techniques proposed to prevent DNN IP infringement. Although these two techniques achieve high performance on defending against previously proposed DNN stealing attacks, researchers recently show that both of them are ineffective against generative model inversion attacks. Specifically, an adversary inverts training data from well-trained DNNs and uses the inverted data to train DNNs from scratch such that DNN watermarking and fingerprinting are both bypassed. This novel model stealing strategy shows that data inverted from victim models can be effectively exploited by adversaries, which poses a new threat to the IP protection of DNNs. To combat this new threat, one potential solution is to enable defenders to prove ownership on data inverted from models being protected. If the training data of a suspected model, which can be disclosed via the judicial process, are proven to be data inverted from victim models, then IP infringement is detected. This research direction is currently underexplored. In this paper, we fill the gap in the literature to investigate countermeasures against this emerging threat. We propose a simple but effective method, called InverseDataInspector (IDI), to detect whether data are inverted from victim models. Specifically, our method first extracts features from both the inverted data and victim models. These features are then combined and used for training classifiers. Experimental results demonstrate that our method achieves high performance on detecting inverted data and also generalizes to new generative model inversion methods that are not seen when training classifiers.

JAIR Semantic Scholar

Cite

Text

Yu et al. "Detecting Generative Model Inversion Attacks for Protecting Intellectual Property of Deep Neural Networks." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.19468

Markdown

[Yu et al. "Detecting Generative Model Inversion Attacks for Protecting Intellectual Property of Deep Neural Networks." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/yu2025jair-detecting/) doi:10.1613/JAIR.1.19468

BibTeX

@article{yu2025jair-detecting,
  title     = {{Detecting Generative Model Inversion Attacks for Protecting Intellectual Property of Deep Neural Networks}},
  author    = {Yu, Yiding and Zong, Wei and Su, Wenjing and Chow, Yang-Wai and Susilo, Willy},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2025},
  doi       = {10.1613/JAIR.1.19468},
  volume    = {84},
  url       = {https://mlanthology.org/jair/2025/yu2025jair-detecting/}
}