Learning Indexing Patterns from One Language for the Benefit of Others
Abstract
Using language technology for text analysis and light-weight ontologies as a content-mediating level, we acquire index-ing patterns from vast amounts of indexing data for English-language medical documents. This is achieved by statisti-cally relating interlingual representations of these documents (based on text token bigrams) to their associated index terms. From these ‘English ’ indexing patterns, we then induce the associated index terms for German and Portuguese docu-ments when their interlingual representations match those of English documents. Thus, we learn from past English in-dexing experience and transfer it in an unsupervised way to non-English texts, without ever having seen concrete index-ing data for languages other than English.
Cite
Text
Hahn et al. "Learning Indexing Patterns from One Language for the Benefit of Others." AAAI Conference on Artificial Intelligence, 2004.Markdown
[Hahn et al. "Learning Indexing Patterns from One Language for the Benefit of Others." AAAI Conference on Artificial Intelligence, 2004.](https://mlanthology.org/aaai/2004/hahn2004aaai-learning/)BibTeX
@inproceedings{hahn2004aaai-learning,
title = {{Learning Indexing Patterns from One Language for the Benefit of Others}},
author = {Hahn, Udo and Markó, Kornél G. and Schulz, Stefan},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2004},
pages = {406-411},
url = {https://mlanthology.org/aaai/2004/hahn2004aaai-learning/}
}