Efficient Computation of Gapped Substring Kernels on Large Alphabets

Abstract

We present a sparse dynamic programming algorithm that, given two strings s and t , a gap penalty λ, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O(p |M| log |t|), where M = (i,j) | si = tj is the set of matches of characters in the two sequences. The algorithm is easily adapted to handle bounded length subsequences and different gap-penalty schemes, including penalizing by the total length of gaps and the number of gaps as well as incorporating character-specific match/gap penalties.

Cite

Text

Rousu and Shawe-Taylor. "Efficient Computation of Gapped Substring Kernels on Large Alphabets." Journal of Machine Learning Research, 2005.

Markdown

[Rousu and Shawe-Taylor. "Efficient Computation of Gapped Substring Kernels on Large Alphabets." Journal of Machine Learning Research, 2005.](https://mlanthology.org/jmlr/2005/rousu2005jmlr-efficient/)

BibTeX

@article{rousu2005jmlr-efficient,
  title     = {{Efficient Computation of Gapped Substring Kernels on Large Alphabets}},
  author    = {Rousu, Juho and Shawe-Taylor, John},
  journal   = {Journal of Machine Learning Research},
  year      = {2005},
  pages     = {1323-1344},
  volume    = {6},
  url       = {https://mlanthology.org/jmlr/2005/rousu2005jmlr-efficient/}
}