Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition
Abstract
A neural network classification method has been developed as an alternative approach to the search/organization problem of protein sequence databases. The neural networks used are three-layered, feed-forward, back-propagation networks. The protein sequences are encoded into neural input vectors by a hashing method that counts occurrences of n -gram words. A new SVD (singular value decomposition) method, which compresses the long and sparse n -gram input vectors and captures semantics of n -gram words, has improved the generalization capability of the network. A full-scale protein classification system has been implemented on a Cray supercomputer to classify unknown sequences into 3311 PIR (Protein Identification Resource) superfamilies/families at a speed of less than 0.05 CPU second per sequence. The sensitivity is close to 90% overall, and approaches 100% for large superfamilies. The system could be used to reduce the database search time and is being used to help organize the PIR protein sequence database.
Cite
Text
Wu et al. "Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition." Machine Learning, 1995. doi:10.1007/BF00993384Markdown
[Wu et al. "Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition." Machine Learning, 1995.](https://mlanthology.org/mlj/1995/wu1995mlj-neural/) doi:10.1007/BF00993384BibTeX
@article{wu1995mlj-neural,
title = {{Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition}},
author = {Wu, Cathy H. and Berry, Michael W. and Shivakumar, Sailaja},
journal = {Machine Learning},
year = {1995},
pages = {177-193},
doi = {10.1007/BF00993384},
volume = {21},
url = {https://mlanthology.org/mlj/1995/wu1995mlj-neural/}
}