A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars

Abstract

We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a certain type of stochastic tree grammars. In particular, we concentrate on the problem of predicting β-sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to β-sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars (SRNRG), which are powerful enough to capture the type of dependencies exhibited by the sequences of β-sheet regions, such as the ‘parallel’ and ‘anti-parallel’ dependencies and their combinations. Our learning algorithm is an adaptation of the ‘Inside-Outside’ algorithm (for Stochastic CFG) to SRNRG with a couple of significant modifications: By placing a restriction on the form of SRNRG, we devised a simpler and faster learning algorithm, and the algorithm is equipped with a new iterative way of reducing the alphabet size (i.e. the number of amino acids) by clustering them using their physico-chemical properties. Our preliminary experiments indicate that our method is able to capture and generalize the kind of long-distance dependencies exhibited by β-sheets, which was previously not possible. Our method was actually able to predict the β-sheet regions of a protein that is less than 25 per cent homologous to the sequences in the training data.

Cite

Text

Abe and Mamitsuka. "A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50009-X

Markdown

[Abe and Mamitsuka. "A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/abe1994icml-new/) doi:10.1016/B978-1-55860-335-6.50009-X

BibTeX

@inproceedings{abe1994icml-new,
  title     = {{A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars}},
  author    = {Abe, Naoki and Mamitsuka, Hiroshi},
  booktitle = {International Conference on Machine Learning},
  year      = {1994},
  pages     = {3-11},
  doi       = {10.1016/B978-1-55860-335-6.50009-X},
  url       = {https://mlanthology.org/icml/1994/abe1994icml-new/}
}