The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses
Abstract
We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a watermark, an adversarial defense, or a transferable attack. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling _all_ efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Furthermore, we show that any task that satisfies our notion of a transferable attack implies a cryptographic primitive, thus requiring the underlying task to be computationally complex. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.
Cite
Text
Gluch et al. "The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses." ICLR 2025 Workshops: WMARK, 2025.Markdown
[Gluch et al. "The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses." ICLR 2025 Workshops: WMARK, 2025.](https://mlanthology.org/iclrw/2025/gluch2025iclrw-good/)BibTeX
@inproceedings{gluch2025iclrw-good,
title = {{The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses}},
author = {Gluch, Grzegorz and Turan, Berkant and Nagarajan, Sai Ganesh and Pokutta, Sebastian},
booktitle = {ICLR 2025 Workshops: WMARK},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/gluch2025iclrw-good/}
}