The Myth of the Pyramid

Abstract

A deep-rooted strategy for building convolutional neural networks in computer vision is to increase the number of filters every time the feature map resolution is decreased. The notion ruling this pyramidal design is that the expressivity of the network increases with a higher number of filters to compensate for losses caused for lower resolutions. This paper challenges the practice by testing a set of variate distribution of filters, named filter templates, on popular CNN architectures (VGG, ResNet, MobileNet and MnasNet). The experimental results show that the superiority of the pyramidal design holds on the ImageNet dataset but fails for other datasets such as MNIST, CIFAR and TinyImageNet, and for other tasks such as audio classification. CNN models with different filter distributions deliver higher accuracy with reduced resource consumption suggesting the pyramidal design has been optimised to Imagenet and that each model-dataset pair benefits from tuning the number and distribution of filters. To further illustrate the benefits of exploring other distributions, this paper shows that the best performing model from the NASBench101 dataset can increase its accuracy over the original pyramidal design with reductions of parameters up to 68 per cent by using templates. Overall, our experiments point to new opportunities for model designers to find more efficient models.

Cite

Text

Izquierdo-Cordova and Mayol-Cuevas. "The Myth of the Pyramid." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00036

Markdown

[Izquierdo-Cordova and Mayol-Cuevas. "The Myth of the Pyramid." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/izquierdocordova2024cvprw-myth/) doi:10.1109/CVPRW63382.2024.00036

BibTeX

@inproceedings{izquierdocordova2024cvprw-myth,
  title     = {{The Myth of the Pyramid}},
  author    = {Izquierdo-Cordova, Ramon and Mayol-Cuevas, Walterio W.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {311-321},
  doi       = {10.1109/CVPRW63382.2024.00036},
  url       = {https://mlanthology.org/cvprw/2024/izquierdocordova2024cvprw-myth/}
}