Linear-Time Computation of Similarity Measures for Sequential Data
Abstract
Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures.
Cite
Text
Rieck and Laskov. "Linear-Time Computation of Similarity Measures for Sequential Data." Journal of Machine Learning Research, 2008.Markdown
[Rieck and Laskov. "Linear-Time Computation of Similarity Measures for Sequential Data." Journal of Machine Learning Research, 2008.](https://mlanthology.org/jmlr/2008/rieck2008jmlr-lineartime/)BibTeX
@article{rieck2008jmlr-lineartime,
title = {{Linear-Time Computation of Similarity Measures for Sequential Data}},
author = {Rieck, Konrad and Laskov, Pavel},
journal = {Journal of Machine Learning Research},
year = {2008},
pages = {23-48},
volume = {9},
url = {https://mlanthology.org/jmlr/2008/rieck2008jmlr-lineartime/}
}