Automated Discovery of Detectors and Iteration-Performing Calculations to Recognize Patterns in Protein Sequences Using Genetic Programming
Abstract
This paper describes an automated process for the dynamic creation of a pattern-recognizing computer program consisting of initially-unknown detectors, an initially-unknown iterative calculation incorporating the as-yet-uncreated detectors, and an initially-unspecified final calculation incorporating the results of the as-yetuncreated iteration. The program's goal is to recognize a given protein segment as being a transmembrane domain or non-transmembrane area. The recognizing program to solve this problem will be evolved using the recentlydeveloped genetic programming paradigm. Genetic programming starts with a primordial ooze of randomly generated computer programs composed of available programmatic ingredients and then genetically breeds the population using the Darwinian principle of survival of the fittest and the genetic crossover (sexual recombination) operation. Automatic function definition enables genetic programming to dynamically create subroutines (detectors). When cross-validated, the best genetically-evolved recognizer achieves an out-of-sample correlation of 0.968 and an outof-sample error rate of 1.6%. This error rate is better than that recently reported for five other methods. 1. Statement of the Problem The goal in this paper is to use genetic programming with automatically defined functions (ADFs) to create a computer program for recognizing a given subsequence of amino acids in a protein as being a transmembrane domain or non-transmembrane area of the protein. The automated process that will create the recognizing program for this problem will be given a set of differently-sized protein segments and the correct classification for each segment. The recognizing program will consist of initiallyunspecified detectors, an initially-unspecified iterative calculation incorporating the as-yet-undiscovered detectors, and an initially-unspecified final calculation
Cite
Text
Koza. "Automated Discovery of Detectors and Iteration-Performing Calculations to Recognize Patterns in Protein Sequences Using Genetic Programming." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1994. doi:10.1109/CVPR.1994.323778Markdown
[Koza. "Automated Discovery of Detectors and Iteration-Performing Calculations to Recognize Patterns in Protein Sequences Using Genetic Programming." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1994.](https://mlanthology.org/cvpr/1994/koza1994cvpr-automated/) doi:10.1109/CVPR.1994.323778BibTeX
@inproceedings{koza1994cvpr-automated,
title = {{Automated Discovery of Detectors and Iteration-Performing Calculations to Recognize Patterns in Protein Sequences Using Genetic Programming}},
author = {Koza, John R.},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {1994},
pages = {684-689},
doi = {10.1109/CVPR.1994.323778},
url = {https://mlanthology.org/cvpr/1994/koza1994cvpr-automated/}
}