How Rare Events Shape the Learning Curves of Hierarchical Data
Abstract
The learning curves of deep learning methods often behave as a power of the dataset size. The theoretical understanding of the corresponding exponent yields fundamental insights about the learning problem. However, it is still limited to extremely simple datasets and idealised learning scenarios, such as the lazy regime where the network acts as a kernel method. Recent works study how deep networks learn synthetic classification tasks generated by probabilistic context-free grammars: generative processes which model the hierarchical and compositional structure of language and images. Previous studies assumed composition rules to be equally likely, leading to non-power-law behavior for classification. In realistic dataset, instead, some rules may be much rarer than others. By assuming that the probabilities of these rules follow a Zipf law with exponent $a$, we show that the classification performance of deep neural networks decays as a power $\alpha\,{=}\,a/(1+a)$ of the number of training examples, with a large multiplicative constant that depends on the hierarchical structure of the data.
Cite
Text
Kang et al. "How Rare Events Shape the Learning Curves of Hierarchical Data." NeurIPS 2024 Workshops: SciForDL, 2024.Markdown
[Kang et al. "How Rare Events Shape the Learning Curves of Hierarchical Data." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/kang2024neuripsw-rare/)BibTeX
@inproceedings{kang2024neuripsw-rare,
title = {{How Rare Events Shape the Learning Curves of Hierarchical Data}},
author = {Kang, Hyunmo and Cagnetta, Francesco and Wyart, Matthieu},
booktitle = {NeurIPS 2024 Workshops: SciForDL},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/kang2024neuripsw-rare/}
}