Understanding Simplicity Bias Towards Compositional Mappings via Learning Dynamics

Abstract

Obtaining compositional mappings is important for the model to generalize well compositionally. To better understand when and how to encourage the model to learn such mappings, we study their uniqueness through different perspectives. Specifically, we first show that the compositional mappings are the simplest bijections through the lens of coding length (i.e., an upper bound of their Kolmogorov complexity). This property explains why models having such mappings can generalize well. We further show that the simplicity bias is usually an intrinsic property of neural network training via gradient descent. That partially explains why some models spontaneously generalize well when they are trained appropriately.

Cite

Text

Ren and Sutherland. "Understanding Simplicity Bias Towards Compositional Mappings via Learning Dynamics." NeurIPS 2024 Workshops: Compositional_Learning, 2024.

Markdown

[Ren and Sutherland. "Understanding Simplicity Bias Towards Compositional Mappings via Learning Dynamics." NeurIPS 2024 Workshops: Compositional_Learning, 2024.](https://mlanthology.org/neuripsw/2024/ren2024neuripsw-understanding/)

BibTeX

@inproceedings{ren2024neuripsw-understanding,
  title     = {{Understanding Simplicity Bias Towards Compositional Mappings via Learning Dynamics}},
  author    = {Ren, Yi and Sutherland, Danica J.},
  booktitle = {NeurIPS 2024 Workshops: Compositional_Learning},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/ren2024neuripsw-understanding/}
}