Novel Applications of Item Response Theory for Analysing Data Set Complexity and Benchmark Selection
Abstract
Item response theory (IRT) was developed in psychometrics to measure the latent skills of human respondents based on their observed responses to items with different difficulty levels. Human ability is high in IRT when one correctly responds to difficult items despite random mistakes in easy items. IRT has been recently framed as a powerful tool to characterise instance hardness in classification problems by measuring difficulty and discrimination levels of instances in a data set based on the correctness of a set of classifiers. Here, we generalise such a concept to the data set level by taking a pool of 509 classification data sets and assessing their difficulties and discriminations based on the performance achieved by 95 classifiers when solving these problems. The ability is estimated such that high abilities are assigned to classifiers with better behaviour in hard data sets. We further evaluated IRT in two distinct applications. First, we build a regression meta-model where complexity measures are used to predict the IRT parameters of new data sets without the need to retrain the IRT model. Second, we propose two IRT-based benchmarks with 30 data sets each to test classifiers, one selected for diversity and another selected for greater difficulty. Both benchmarks may be used to evaluate new methods more broadly, instead of the common practice of gathering random data sets from public repositories.
Cite
Text
Pereira et al. "Novel Applications of Item Response Theory for Analysing Data Set Complexity and Benchmark Selection." Machine Learning, 2025. doi:10.1007/S10994-025-06873-3Markdown
[Pereira et al. "Novel Applications of Item Response Theory for Analysing Data Set Complexity and Benchmark Selection." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/pereira2025mlj-novel/) doi:10.1007/S10994-025-06873-3BibTeX
@article{pereira2025mlj-novel,
title = {{Novel Applications of Item Response Theory for Analysing Data Set Complexity and Benchmark Selection}},
author = {Pereira, João Luiz Junho and de Queiroz, Alfredo Antonio Alencar Exposito and de Menezes e Silva Filho, Telmo and Lorena, Ana Carolina and Mantovani, Rafael Gomes and Pappa, Gisele Lobo and Prudêncio, Ricardo Bastos Cavalcante},
journal = {Machine Learning},
year = {2025},
pages = {222},
doi = {10.1007/S10994-025-06873-3},
volume = {114},
url = {https://mlanthology.org/mlj/2025/pereira2025mlj-novel/}
}