A Boo(n) for Evaluating Architecture Performance

Abstract

We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.

Cite

Text

Bajgar et al. "A Boo(n) for Evaluating Architecture Performance." International Conference on Machine Learning, 2018.

Markdown

[Bajgar et al. "A Boo(n) for Evaluating Architecture Performance." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/bajgar2018icml-boo/)

BibTeX

@inproceedings{bajgar2018icml-boo,
  title     = {{A Boo(n) for Evaluating Architecture Performance}},
  author    = {Bajgar, Ondrej and Kadlec, Rudolf and Kleindienst, Jan},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {334-343},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/bajgar2018icml-boo/}
}