LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR
Abstract
We propose Legato, a new end-to-end model for optical music recognition (OMR), a task of converting music score images to machine-readable documents. Legato is the first large-scale pretrained OMR model capable of recognizing full-page or multi-page typeset music scores and the first to generate documents in ABC notation, a concise, human-readable format for symbolic music. Bringing together a pretrained vision encoder with an ABC decoder trained on a dataset of more than 214K images, our model exhibits the strong ability to generalize across various typeset scores. We conduct comprehensive experiments on a range of datasets and metrics and demonstrate that Legato outperforms the previous state of the art. On our most realistic dataset, we see a 68\% and 47.6\% absolute error reduction on the standard metrics TEDn and OMR-NED, respectively.
Cite
Text
Yang et al. "LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR." International Conference on Learning Representations, 2026.Markdown
[Yang et al. "LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yang2026iclr-legato/)BibTeX
@inproceedings{yang2026iclr-legato,
title = {{LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR}},
author = {Yang, Guang and Ebert, Victoria and Tamer, Nazif Can and Zheng, Brian Siyuan and Pozzobon, Luiza Amador and Smith, Noah A.},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/yang2026iclr-legato/}
}