Learning to Recognize Promoter Sequences in E. Coli by Modeling Uncertainty in the Training Data

Abstract

Automatic recognition of promoter sequences is an important open problem in molecular biology. Unfortunately, the usual machine learning version of this problem is critically flawed. In particular, the dataset available from the Irvine repository was drawn from a compilation of promoter sequences that were preprocessed to conform to the biologists ’ related notion of the corrserzsUs sequence, a first-order approximation with a number of shortcomings that are well-known in molecular biology. Although concept descriptions learned from the Irvine data may represent the consensus sequence, they do not represent promoters. More generally, imperfections in preprocessed data and statistical variations in the locations of biologically meaningful features within the raw data invalidate standard attribute-based approaches. I suggest a dataset, a concept-description language, and a model of uncertainty in the promoter data that are all biologically justified, then address the learning problem with incremental probabilistic evidence combination. This knowledge-based approach yields a more accurate and more credible solution than other more conventional machine learning systems.

Cite

Text

Norton. "Learning to Recognize Promoter Sequences in E. Coli by Modeling Uncertainty in the Training Data." AAAI Conference on Artificial Intelligence, 1994.

Markdown

[Norton. "Learning to Recognize Promoter Sequences in E. Coli by Modeling Uncertainty in the Training Data." AAAI Conference on Artificial Intelligence, 1994.](https://mlanthology.org/aaai/1994/norton1994aaai-learning/)

BibTeX

@inproceedings{norton1994aaai-learning,
  title     = {{Learning to Recognize Promoter Sequences in E. Coli by Modeling Uncertainty in the Training Data}},
  author    = {Norton, Steven W.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1994},
  pages     = {657-663},
  url       = {https://mlanthology.org/aaai/1994/norton1994aaai-learning/}
}