Relevant Subsequence Detection with Sparse Dictionary Learning

Abstract

Sparse Dictionary Learning has recently become popular for discovering latent components that can be used to reconstruct elements in a dataset. Analysis of sequence data could also benefit from this type of decomposition, but sequence datasets are not natively accepted by the Sparse Dictionary Learning model. A strategy for making sequence data more manageable is to extract all subsequences of a fixed length from the original sequence dataset. This subsequence representation can then be input to a Sparse Dictionary Learner. This strategy can be problematic because self-similar patterns within sequences are over-represented. In this work, we propose an alternative for applying Sparse Dictionary Learning to sequence datasets. We call this alternative Relevant Subsequence Dictionary Learning (RS-DL). Our method involves constructing separate dictionaries for each sequence in a dataset from shared sets of relevant subsequence patterns. Through experiments, we show that decompositions of sequence data induced by our RS-DL model can be effective both for discovering repeated patterns meaningful to humans and for extracting features useful for sequence classification.

Cite

Text

Blasiak et al. "Relevant Subsequence Detection with Sparse Dictionary Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013. doi:10.1007/978-3-642-40988-2_26

Markdown

[Blasiak et al. "Relevant Subsequence Detection with Sparse Dictionary Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013.](https://mlanthology.org/ecmlpkdd/2013/blasiak2013ecmlpkdd-relevant/) doi:10.1007/978-3-642-40988-2_26

BibTeX

@inproceedings{blasiak2013ecmlpkdd-relevant,
  title     = {{Relevant Subsequence Detection with Sparse Dictionary Learning}},
  author    = {Blasiak, Sam and Rangwala, Huzefa and Laskey, Kathryn B.},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2013},
  pages     = {401-416},
  doi       = {10.1007/978-3-642-40988-2_26},
  url       = {https://mlanthology.org/ecmlpkdd/2013/blasiak2013ecmlpkdd-relevant/}
}