Characterizing the Optimal $0-1$ Loss for Multi-Class Classification with a Test-Time Attacker

Abstract

Finding classifiers robust to adversarial examples is critical for their safedeployment. Determining the robustness of the best possible classifier under agiven threat model for a fixed data distribution and comparing it to thatachieved by state-of-the-art training methods is thus an important diagnostictool. In this paper, we find achievable information-theoretic lower bounds onrobust loss in the presence of a test-time attacker for *multi-classclassifiers on any discrete dataset*. We provide a general framework for findingthe optimal $0-1$ loss that revolves around the construction of a conflicthypergraph from the data and adversarial constraints. The prohibitive cost ofthis formulation in practice leads us to formulate other variants of the attacker-classifiergame that more efficiently determine the range of the optimal loss. Ourvaluation shows, for the first time, an analysis of the gap to optimalrobustness for classifiers in the multi-class setting on benchmark datasets.

Cite

Text

Dai et al. "Characterizing the Optimal $0-1$ Loss for Multi-Class Classification with a Test-Time Attacker." Neural Information Processing Systems, 2023.

Markdown

[Dai et al. "Characterizing the Optimal $0-1$ Loss for Multi-Class Classification with a Test-Time Attacker." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/dai2023neurips-characterizing/)

BibTeX

@inproceedings{dai2023neurips-characterizing,
  title     = {{Characterizing the Optimal $0-1$ Loss for Multi-Class Classification with a Test-Time Attacker}},
  author    = {Dai, Sihui and Ding, Wenxin and Bhagoji, Arjun Nitin and Cullina, Daniel and Zheng, Heather and Zhao, Ben and Mittal, Prateek},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/dai2023neurips-characterizing/}
}