Measuring Performance When Positives Are Rare: Relative Advantage Versus Predictive Accuracy - A Biological Case Study

Muggleton, Stephen H.; Bryant, Christopher H.; Srinivasan, Ashwin

doi:10.1007/3-540-45164-1_32

Measuring Performance When Positives Are Rare: Relative Advantage Versus Predictive Accuracy - A Biological Case Study

Stephen H. Muggleton, Christopher H. Bryant, Ashwin Srinivasan

ECML-PKDD 2000 pp. 300-312

doi:10.1007/3-540-45164-1_32 /ecmlpkdd/2000/muggleton2000ecml-measuring/

Abstract

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Performance is measured using both predictive accuracy and a new cost function, Relative Advantage ( RA ). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Muggleton et al. "Measuring Performance When Positives Are Rare: Relative Advantage Versus Predictive Accuracy - A Biological Case Study." European Conference on Machine Learning, 2000. doi:10.1007/3-540-45164-1_32

Markdown

[Muggleton et al. "Measuring Performance When Positives Are Rare: Relative Advantage Versus Predictive Accuracy - A Biological Case Study." European Conference on Machine Learning, 2000.](https://mlanthology.org/ecmlpkdd/2000/muggleton2000ecml-measuring/) doi:10.1007/3-540-45164-1_32

BibTeX

@inproceedings{muggleton2000ecml-measuring,
  title     = {{Measuring Performance When Positives Are Rare: Relative Advantage Versus Predictive Accuracy - A Biological Case Study}},
  author    = {Muggleton, Stephen H. and Bryant, Christopher H. and Srinivasan, Ashwin},
  booktitle = {European Conference on Machine Learning},
  year      = {2000},
  pages     = {300-312},
  doi       = {10.1007/3-540-45164-1_32},
  url       = {https://mlanthology.org/ecmlpkdd/2000/muggleton2000ecml-measuring/}
}