Protein Secondary Structure Prediction Based on Stochastic-Rule Learning

Abstract

This paper proposes a new strategy for predicting α-helix regions for any given protein sequence, on the basis of the theory of learning stochastic rules. We confine our study to the problem of predicting where α-helix regions are located in a given protein sequence, rather than the conventional three-state prediction problem, i.e., that of predicting to which among the three-states (α-helix, β-sheet, or coil) each of the amino acids in the sequence corresponds. Our strategy consists of three steps: generation of training examples, learning, and prediction. In the learning phase, we construct a rule for secondary-structure prediction from training examples. Here a rule is represented not as a deterministic rule but as a stochastic rule , i.e., a probability distribution which assigns, to each region in a sequence, a probability that it corresponds to α-helix. Each stochastic rule used here is further represented as the product of a number of stochastic rules with finite partitioning developed by Yamanishi. Optimal stochastic rules with finite partitioning are obtained from training examples by Laplace estimation of real-valued parameters and by model selection based on the minimum description length (MDL) principle. We allow our stochastic rules to make use of not only the characters themselves of amino acids but also their physico-chemical properties (i.e., numerical attributes, e.g. hydrophobicity, molecular weight, etc). In the prediction phase, when given a test sequence, the likelihood that any given region (i.e., any subsequence of amino acids) in the test sequence corresponds to α-helix is calculated with the stochastic rules constructed in the learning phase. We evaluate the predictive performance of our strategy from experimental viewpoints. In generating training examples, examples of α-helix regions are drawn from hemoglobin sequences alone. Experimental results show that the prediction accuracy rate of our prediction strategy was 94.8% for hemoglobin α- chain (1HBSα), 68.5% for parvalbumin β (1CDP), and 73.6% for lysozyme c (1LYM), a significant rate over the rate achieved with the Garnier-Osguthorpe-Robson's (GOR) method.

Cite

Text

Mamitsuka and Yamanishi. "Protein Secondary Structure Prediction Based on Stochastic-Rule Learning." International Conference on Algorithmic Learning Theory, 1992. doi:10.1007/3-540-57369-0_43

Markdown

[Mamitsuka and Yamanishi. "Protein Secondary Structure Prediction Based on Stochastic-Rule Learning." International Conference on Algorithmic Learning Theory, 1992.](https://mlanthology.org/alt/1992/mamitsuka1992alt-protein/) doi:10.1007/3-540-57369-0_43

BibTeX

@inproceedings{mamitsuka1992alt-protein,
  title     = {{Protein Secondary Structure Prediction Based on Stochastic-Rule Learning}},
  author    = {Mamitsuka, Hiroshi and Yamanishi, Kenji},
  booktitle = {International Conference on Algorithmic Learning Theory},
  year      = {1992},
  pages     = {240-251},
  doi       = {10.1007/3-540-57369-0_43},
  url       = {https://mlanthology.org/alt/1992/mamitsuka1992alt-protein/}
}