Oops, I Sampled It Again: Reinterpreting Confidence Intervals in Few-Shot Learning
Abstract
The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e. allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs computed with and without replacement. These reveal a notable underestimation by the predominant method. This observation calls for a reevaluation of how we interpret confidence intervals and the resulting conclusions in FSL comparative studies. Our research demonstrates that the use of paired tests can partially address this issue. Additionally, we explore methods to further reduce the (size of the) CI by strategically sampling tasks of a specific size. We also introduce a new optimized benchmark, which can be accessed at https://github.com/RafLaf/FSL-benchmark-again
Cite
Text
Lafargue et al. "Oops, I Sampled It Again: Reinterpreting Confidence Intervals in Few-Shot Learning." Transactions on Machine Learning Research, 2024.Markdown
[Lafargue et al. "Oops, I Sampled It Again: Reinterpreting Confidence Intervals in Few-Shot Learning." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/lafargue2024tmlr-oops/)BibTeX
@article{lafargue2024tmlr-oops,
title = {{Oops, I Sampled It Again: Reinterpreting Confidence Intervals in Few-Shot Learning}},
author = {Lafargue, Raphael and Smith, Luke A and Vermet, Franck and Löwe, Matthias and Reid, Ian and Valmadre, Jack and Gripon, Vincent},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/lafargue2024tmlr-oops/}
}