Disambiguating Symbolic Expressions in Informal Documents

Abstract

We propose the task of \emph{disambiguating} symbolic expressions in informal STEM documents in the form of \LaTeX files -- that is, determining their precise semantics and abstract syntax tree -- as a neural machine translation task. We discuss the distinct challenges involved and present a dataset with roughly 33,000 entries. We evaluated several baseline models on this dataset, which failed to yield even syntactically valid \LaTeX before overfitting. Consequently, we describe a methodology using a \emph{transformer} language model pre-trained on sources obtained from \url{arxiv.org}, which yields promising results despite the small size of the dataset. We evaluate our model using a plurality of dedicated techniques, taking syntax and semantics of symbolic expressions into account.

Cite

Text

Müller and Kaliszyk. "Disambiguating Symbolic Expressions in Informal Documents." International Conference on Learning Representations, 2021.

Markdown

[Müller and Kaliszyk. "Disambiguating Symbolic Expressions in Informal Documents." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/muller2021iclr-disambiguating/)

BibTeX

@inproceedings{muller2021iclr-disambiguating,
  title     = {{Disambiguating Symbolic Expressions in Informal Documents}},
  author    = {Müller, Dennis and Kaliszyk, Cezary},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/muller2021iclr-disambiguating/}
}