Inference Algorithms for Pattern-Based CRFs on Sequence Data

Abstract

We consider \em Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x_1\ldots x_n is the sum of terms over intervals [i,j] where each term is non-zero only if the substring x_i\ldots x_j equals a prespecified pattern α. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(n L), O(n L \ell_\max) and O(n L \min{|D|,\log (\ell_\max + 1)}) where L is the combined length of input patterns, \ell_\max is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of \citeYe:NIPS09 whose complexities are respectively O(n L |D|), O\left(n |Γ| L^2 \ell_\max^2\right) and O(n L |D|), where |Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights. Finally, we apply pattern-based CRFs to the problem of the protein dihedral angles prediction.

Cite

Text

Takhanov and Kolmogorov. "Inference Algorithms for Pattern-Based CRFs on Sequence Data." International Conference on Machine Learning, 2013.

Markdown

[Takhanov and Kolmogorov. "Inference Algorithms for Pattern-Based CRFs on Sequence Data." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/takhanov2013icml-inference/)

BibTeX

@inproceedings{takhanov2013icml-inference,
  title     = {{Inference Algorithms for Pattern-Based CRFs on Sequence Data}},
  author    = {Takhanov, Rustem and Kolmogorov, Vladimir},
  booktitle = {International Conference on Machine Learning},
  year      = {2013},
  pages     = {145-153},
  volume    = {28},
  url       = {https://mlanthology.org/icml/2013/takhanov2013icml-inference/}
}