Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees
Abstract
We propose a generic algorithm for computation of similarity measures for se- quential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subse- quences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coeffi- cients for sequences as alternatives to classical kernel functions.
Cite
Text
Rieck et al. "Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees." Neural Information Processing Systems, 2006.Markdown
[Rieck et al. "Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/rieck2006neurips-computation/)BibTeX
@inproceedings{rieck2006neurips-computation,
title = {{Computation of Similarity Measures for Sequential Data Using Generalized Suffix Trees}},
author = {Rieck, Konrad and Laskov, Pavel and Sonnenburg, Sören},
booktitle = {Neural Information Processing Systems},
year = {2006},
pages = {1177-1184},
url = {https://mlanthology.org/neurips/2006/rieck2006neurips-computation/}
}