Symbolic Regression with a Learned Concept Library

Abstract

We present a novel method for symbolic regression (SR), the task of searching for compact programmatic hypotheses that best explain a dataset. The problem is commonly solved using genetic algorithms; we show that we can enhance such methods by inducing a library of abstract textual concepts. Our algorithm, called LaSR, uses zero-shot queries to a large language model (LLM) to discover and evolve concepts occurring in known high-performing hypotheses. We discover new hypotheses using a mix of standard evolutionary steps and LLM-guided steps (obtained through zero-shot LLM queries) conditioned on discovered concepts. Once discovered, hypotheses are used in a new round of concept abstraction and evolution. We validate LaSR on the Feynman equations, a popular SR benchmark, as well as a set of synthetic tasks. On these benchmarks, LaSR substantially outperforms a variety of state-of-the-art SR approaches based on deep learning and evolutionary algorithms. Moreover, we show that LASR can be used to discover a new and powerful scaling law for LLMs.

Cite

Text

Grayeli et al. "Symbolic Regression with a Learned Concept Library." Neural Information Processing Systems, 2024. doi:10.52202/079017-1419

Markdown

[Grayeli et al. "Symbolic Regression with a Learned Concept Library." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/grayeli2024neurips-symbolic/) doi:10.52202/079017-1419

BibTeX

@inproceedings{grayeli2024neurips-symbolic,
  title     = {{Symbolic Regression with a Learned Concept Library}},
  author    = {Grayeli, Arya and Sehgal, Atharva and Costilla-Reyes, Omar and Cranmer, Miles and Chaudhuri, Swarat},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1419},
  url       = {https://mlanthology.org/neurips/2024/grayeli2024neurips-symbolic/}
}