Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees

Rieck, Konrad; Laskov, Pavel; Sonnenburg, Sören

Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees

Konrad Rieck, Pavel Laskov, Sören Sonnenburg

NeurIPS 2006 pp. 1177-1184

/neurips/2006/rieck2006neurips-computation/

Abstract

We propose a generic algorithm for computation of similarity measures for se- quential data. The algorithm uses generalized sufﬁx trees for efﬁcient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subse- quences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefﬁ- cients for sequences as alternatives to classical kernel functions.

PDF NeurIPS Semantic Scholar

Cite

Text

Rieck et al. "Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees." Neural Information Processing Systems, 2006.

Markdown

[Rieck et al. "Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/rieck2006neurips-computation/)

BibTeX

@inproceedings{rieck2006neurips-computation,
  title     = {{Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees}},
  author    = {Rieck, Konrad and Laskov, Pavel and Sonnenburg, Sören},
  booktitle = {Neural Information Processing Systems},
  year      = {2006},
  pages     = {1177-1184},
  url       = {https://mlanthology.org/neurips/2006/rieck2006neurips-computation/}
}