Disambiguating Symbolic Expressions in Informal Documents
Abstract
We propose the task of \emph{disambiguating} symbolic expressions in informal STEM documents in the form of \LaTeX files -- that is, determining their precise semantics and abstract syntax tree -- as a neural machine translation task. We discuss the distinct challenges involved and present a dataset with roughly 33,000 entries. We evaluated several baseline models on this dataset, which failed to yield even syntactically valid \LaTeX before overfitting. Consequently, we describe a methodology using a \emph{transformer} language model pre-trained on sources obtained from \url{arxiv.org}, which yields promising results despite the small size of the dataset. We evaluate our model using a plurality of dedicated techniques, taking syntax and semantics of symbolic expressions into account.
Cite
Text
Müller and Kaliszyk. "Disambiguating Symbolic Expressions in Informal Documents." International Conference on Learning Representations, 2021.Markdown
[Müller and Kaliszyk. "Disambiguating Symbolic Expressions in Informal Documents." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/muller2021iclr-disambiguating/)BibTeX
@inproceedings{muller2021iclr-disambiguating,
title = {{Disambiguating Symbolic Expressions in Informal Documents}},
author = {Müller, Dennis and Kaliszyk, Cezary},
booktitle = {International Conference on Learning Representations},
year = {2021},
url = {https://mlanthology.org/iclr/2021/muller2021iclr-disambiguating/}
}