Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata
Abstract
Large language models excel at solving complex tasks, owing to their hierarchical architecture that enables the implementation of sophisticated algorithms through layered computations. In this work, we study the interplay between model depth and data complexity using elementary cellular automata (ECA) datasets. We demonstrate empirically that, given a fixed parameter count, deeper networks consistently outperform shallower variants. Our findings reveal that complex ECA rules require a deeper model to emulate. Finally, analysis of attention score patterns elucidates why shallower networks struggle to effectively emulate complex rules.
Cite
Text
He et al. "Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata." NeurIPS 2024 Workshops: SciForDL, 2024.Markdown
[He et al. "Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/he2024neuripsw-exploring/)BibTeX
@inproceedings{he2024neuripsw-exploring,
title = {{Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata}},
author = {He, Tianyu and Doshi, Darshil and Das, Aritra and Gromov, Andrey},
booktitle = {NeurIPS 2024 Workshops: SciForDL},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/he2024neuripsw-exploring/}
}