Mismatch String Kernels for SVM Protein Classification
Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence sim- ilarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most success- ful method for remote homology detection, while achieving considerable computational savings.
Cite
Text
Eskin et al. "Mismatch String Kernels for SVM Protein Classification." Neural Information Processing Systems, 2002.Markdown
[Eskin et al. "Mismatch String Kernels for SVM Protein Classification." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/eskin2002neurips-mismatch/)BibTeX
@inproceedings{eskin2002neurips-mismatch,
title = {{Mismatch String Kernels for SVM Protein Classification}},
author = {Eskin, Eleazar and Weston, Jason and Noble, William S. and Leslie, Christina S.},
booktitle = {Neural Information Processing Systems},
year = {2002},
pages = {1441-1448},
url = {https://mlanthology.org/neurips/2002/eskin2002neurips-mismatch/}
}