The Effect of Instance-Space Partition on Significance

Abstract

This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t -test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.

Cite

Text

Bradford and Brodley. "The Effect of Instance-Space Partition on Significance." Machine Learning, 2001. doi:10.1023/A:1007613918580

Markdown

[Bradford and Brodley. "The Effect of Instance-Space Partition on Significance." Machine Learning, 2001.](https://mlanthology.org/mlj/2001/bradford2001mlj-effect/) doi:10.1023/A:1007613918580

BibTeX

@article{bradford2001mlj-effect,
  title     = {{The Effect of Instance-Space Partition on Significance}},
  author    = {Bradford, Jeffrey P. and Brodley, Carla E.},
  journal   = {Machine Learning},
  year      = {2001},
  pages     = {269-286},
  doi       = {10.1023/A:1007613918580},
  volume    = {42},
  url       = {https://mlanthology.org/mlj/2001/bradford2001mlj-effect/}
}