Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space

Abstract

We present a new approach for learning a sequence regression function, i.e., a mapping from sequential observations to a numeric score. Our learning algorithm employs coordinate gradient descent with Gauss-Southwell optimization in the feature space of all subsequences. We give a tight upper bound for the coordinate wise gradients of squared error loss which enables efficient Gauss-Southwell selection. The proposed bound is built by separating the positive and the negative gradients of the loss function and exploits the structure of the feature space. Extensive experiments on simulated as well as real-world sequence regression benchmarks show that the bound is effective and our proposed learning algorithm is efficient and accurate. The resulting linear regression model provides the user with a list of the most predictive features selected during the learning stage, adding to the interpretability of the method. Code and data related to this chapter are available at: https://github.com/svgsponer/SqLoss .

Cite

Text

Gsponer et al. "Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017. doi:10.1007/978-3-319-71246-8_3

Markdown

[Gsponer et al. "Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017.](https://mlanthology.org/ecmlpkdd/2017/gsponer2017ecmlpkdd-efficient/) doi:10.1007/978-3-319-71246-8_3

BibTeX

@inproceedings{gsponer2017ecmlpkdd-efficient,
  title     = {{Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space}},
  author    = {Gsponer, Severin and Smyth, Barry and Ifrim, Georgiana},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2017},
  pages     = {37-52},
  doi       = {10.1007/978-3-319-71246-8_3},
  url       = {https://mlanthology.org/ecmlpkdd/2017/gsponer2017ecmlpkdd-efficient/}
}