Making Latin Manuscripts Searchable Using gHMM's
Abstract
We describe a method that can make a scanned, handwritten mediaeval latin manuscript accessible to full text search. A generalized HMM is fitted, using transcribed latin to obtain a transition model and one exam- ple each of 22 letters to obtain an emission model. We show results for unigram, bigram and trigram models. Our method transcribes 25 pages of a manuscript of Terence with fair accuracy (75% of letters correctly transcribed). Search results are very strong; we use examples of vari- ant spellings to demonstrate that the search respects the ink of the doc- ument. Furthermore, our model produces fair searches on a document from which we obtained no training data.
Cite
Text
Edwards et al. "Making Latin Manuscripts Searchable Using gHMM's." Neural Information Processing Systems, 2004.Markdown
[Edwards et al. "Making Latin Manuscripts Searchable Using gHMM's." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/edwards2004neurips-making/)BibTeX
@inproceedings{edwards2004neurips-making,
title = {{Making Latin Manuscripts Searchable Using gHMM's}},
author = {Edwards, Jaety and Teh, Yee W. and Bock, Roger and Maire, Michael and Vesom, Grace and Forsyth, David A.},
booktitle = {Neural Information Processing Systems},
year = {2004},
pages = {385-392},
url = {https://mlanthology.org/neurips/2004/edwards2004neurips-making/}
}