Surrogate Regret Bounds for Generalized Classification Performance Metrics

Abstract

We consider optimization of generalized performance metrics for binary classification by means of surrogate losses. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include Fβ\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$F_{\beta }$\end{document}-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function f is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given f, a threshold θ^\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\widehat{\theta }$\end{document} is tuned on a separate validation sample, by direct optimization of the target performance metric. We show that the regret of the resulting classifier (obtained from thresholding f on θ^\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\widehat{\theta }$\end{document}) measured with respect to the target metric is upperbounded by the regret of f measured with respect to the surrogate loss. We also extend our results to cover multilabel classification and provide regret bounds for micro- and macro-averaging measures. Our findings are further analyzed in a computational study on both synthetic and real data sets.

Cite

Text

Kotlowski and Dembczynski. "Surrogate Regret Bounds for Generalized Classification Performance Metrics." Machine Learning, 2017. doi:10.1007/S10994-016-5591-7

Markdown

[Kotlowski and Dembczynski. "Surrogate Regret Bounds for Generalized Classification Performance Metrics." Machine Learning, 2017.](https://mlanthology.org/mlj/2017/kotlowski2017mlj-surrogate/) doi:10.1007/S10994-016-5591-7

BibTeX

@article{kotlowski2017mlj-surrogate,
  title     = {{Surrogate Regret Bounds for Generalized Classification Performance Metrics}},
  author    = {Kotlowski, Wojciech and Dembczynski, Krzysztof},
  journal   = {Machine Learning},
  year      = {2017},
  pages     = {549-572},
  doi       = {10.1007/S10994-016-5591-7},
  volume    = {106},
  url       = {https://mlanthology.org/mlj/2017/kotlowski2017mlj-surrogate/}
}