Linear-Time Computation of Similarity Measures for Sequential Data

Abstract

Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures.

Cite

Text

Rieck and Laskov. "Linear-Time Computation of Similarity Measures for Sequential Data." Journal of Machine Learning Research, 2008.

Markdown

[Rieck and Laskov. "Linear-Time Computation of Similarity Measures for Sequential Data." Journal of Machine Learning Research, 2008.](https://mlanthology.org/jmlr/2008/rieck2008jmlr-lineartime/)

BibTeX

@article{rieck2008jmlr-lineartime,
  title     = {{Linear-Time Computation of Similarity Measures for Sequential Data}},
  author    = {Rieck, Konrad and Laskov, Pavel},
  journal   = {Journal of Machine Learning Research},
  year      = {2008},
  pages     = {23-48},
  volume    = {9},
  url       = {https://mlanthology.org/jmlr/2008/rieck2008jmlr-lineartime/}
}