Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability

Abstract

In this paper, we revisit the problem of distribution-independently learning halfspaces under Massart noise with rate $\eta$. Recent work resolved a long-standing problem in this model of efficiently learning to error $\eta + \epsilon$ for any $\epsilon > 0$, by giving an improper learner that partitions space into $\text{poly}(d,1/\epsilon)$ regions. Here we give a much simpler algorithm and settle a number of outstanding open questions: (1) We give the first \emph{proper} learner for Massart halfspaces that achieves $\eta + \epsilon$. (2) Based on (1), we develop a blackbox knowledge distillation procedure to convert an arbitrarily complex classifier to an equally good proper classifier. (3) By leveraging a simple but overlooked connection to \emph{evolvability}, we show any SQ algorithm requires super-polynomially many queries to achieve $\mathsf{OPT} + \epsilon$. We then zoom out to study generalized linear models and give an efficient algorithm for learning under a challenging new corruption model generalizing Massart noise. Finally we study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties as a byproduct of its strong provable robustness guarantees.

Cite

Text

Chen et al. "Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability." Neural Information Processing Systems, 2020.

Markdown

[Chen et al. "Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/chen2020neurips-classification/)

BibTeX

@inproceedings{chen2020neurips-classification,
  title     = {{Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability}},
  author    = {Chen, Sitan and Koehler, Frederic and Moitra, Ankur and Yau, Morris},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/chen2020neurips-classification/}
}