Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata

Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

NeurIPSW 2024

/neuripsw/2024/he2024neuripsw-exploring/

Abstract

Large language models excel at solving complex tasks, owing to their hierarchical architecture that enables the implementation of sophisticated algorithms through layered computations. In this work, we study the interplay between model depth and data complexity using elementary cellular automata (ECA) datasets. We demonstrate empirically that, given a fixed parameter count, deeper networks consistently outperform shallower variants. Our findings reveal that complex ECA rules require a deeper model to emulate. Finally, analysis of attention score patterns elucidates why shallower networks struggle to effectively emulate complex rules.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

He et al. "Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata." NeurIPS 2024 Workshops: SciForDL, 2024.

Markdown

[He et al. "Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/he2024neuripsw-exploring/)

BibTeX

@inproceedings{he2024neuripsw-exploring,
  title     = {{Exploring Model Depth and Data Complexity Through the Lens of Cellular Automata}},
  author    = {He, Tianyu and Doshi, Darshil and Das, Aritra and Gromov, Andrey},
  booktitle = {NeurIPS 2024 Workshops: SciForDL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/he2024neuripsw-exploring/}
}