Learning Curves of Classification Metrics Based on Confusion Matrices
Abstract
Learning curves of classification metrics, including test error, precision (P), recall (R), F$_1$ score, with regard to training set sizes are a recent hot topic in developing an advanced methodology of model selection and hyperparameter optimization. The existing studies concentrated on formulating the functional shapes of the well-behaved learning curves of test error by using a normality assumption. However, the normality assumption is unreasonable for learning curves of classification metrics because the distributions of most classification metrics, such as P, R, and F$_1$ score, are skewed, and interval estimations of the metrics based on the normality assumption may exceed [0,1]. In this study, considering most classification metrics are obtained from confusion matrices, we develop a novel method to formulate the learning curves of classification metrics by considering that the four entries in a confusion matrix jointly follow a multi-nomial distribution rather than a normality distribution. Furthermore, the function of each entry in a confusion matrix with regard to training set sizes is formulated with an exponential form. Thus, the learning curves of a classification metric can be naturally obtained by transforming the functions of a confusion matrix in terms of the definition of the metric. Moreover, reasonable confidence bands of several popular metrics, including test error, P, R, and F$_1$ score, are derived in this study based on the assumption of the multi-nomial distribution of a confusion matrix. Extensive experiments are conducted on several synthetic and real-world data sets coupled with multiple typical non-neural and neural classification algorithms. Experimental results illustrate the improvements of the proposed learning curves of test error, P, R, and F$_1$ score and the superiority of the confidence bands.
Cite
Text
Xue et al. "Learning Curves of Classification Metrics Based on Confusion Matrices." Proceedings of the 17th Asian Conference on Machine Learning, 2025.Markdown
[Xue et al. "Learning Curves of Classification Metrics Based on Confusion Matrices." Proceedings of the 17th Asian Conference on Machine Learning, 2025.](https://mlanthology.org/acml/2025/xue2025acml-learning/)BibTeX
@inproceedings{xue2025acml-learning,
title = {{Learning Curves of Classification Metrics Based on Confusion Matrices}},
author = {Xue, Yan and Wang, Ruibo and Cao, Xuefei and Yang, Jing and Li, Jihong},
booktitle = {Proceedings of the 17th Asian Conference on Machine Learning},
year = {2025},
pages = {1246-1261},
volume = {304},
url = {https://mlanthology.org/acml/2025/xue2025acml-learning/}
}