The Implicit Bias of Gradient Descent on Separable Multiclass Data

Abstract

Implicit bias describes the phenomenon where optimization-based training algorithms, without explicit regularization, show a preference for simple estimators even when more complex estimators have equal objective values. Multiple works have developed the theory of implicit bias for binary classification under the assumption that the loss satisfies an exponential tail property. However, there is a noticeable gap in analysis for multiclass classification, with only a handful of results which themselves are restricted to the cross-entropy loss. In this work, we employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses [Wang and Scott, 2024] to introduce a multiclass extension of the exponential tail property. This class of losses includes not only cross-entropy but also other losses. Using this framework, we extend the implicit bias result of Soudry et al. [2018] to multiclass classification. Furthermore, our proof techniques closely mirror those of the binary case, thus illustrating the power of the PERM framework for bridging the binary-multiclass gap.

Cite

Text

Ravi et al. "The Implicit Bias of Gradient Descent on Separable Multiclass Data." Neural Information Processing Systems, 2024. doi:10.52202/079017-2585

Markdown

[Ravi et al. "The Implicit Bias of Gradient Descent on Separable Multiclass Data." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ravi2024neurips-implicit/) doi:10.52202/079017-2585

BibTeX

@inproceedings{ravi2024neurips-implicit,
  title     = {{The Implicit Bias of Gradient Descent on Separable Multiclass Data}},
  author    = {Ravi, Hrithik and Scott, Clayton and Soudry, Daniel and Wang, Yutong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2585},
  url       = {https://mlanthology.org/neurips/2024/ravi2024neurips-implicit/}
}