Comparative Gene Prediction Using Conditional Random Fields

Abstract

Computational gene prediction using generative models has reached a plateau, with several groups converging to a generalized hidden Markov model (GHMM) incorporating phylogenetic models of nucleotide sequence evolution. Further improvements in gene calling accuracy are likely to come through new methods that incorporate additional data, both comparative and species specific. Conditional Random Fields (CRFs), which directly model the conditional probability P (y |x) of a vector of hidden states conditioned on a set of observations, provide a unified framework for combining probabilistic and non-probabilistic information and have been shown to outperform HMMs on sequence labeling tasks in natural language processing. We describe the use of CRFs for comparative gene prediction. We implement a model that encapsulates both a phylogenetic-GHMM (our baseline comparative model) and additional non-probabilistic features. We tested our model on the genome sequence of the fungal human pathogen Cryptococcus neoformans. Our baseline comparative model displays accuracy comparable to the the best available gene prediction tool for this organism. Moreover, we show that discriminative training and the incorporation of non-probabilistic evidence significantly improve performance. Our software implementation, Conrad, is freely available with an open source license at http://www.broad.mit.edu/annotation/conrad/.

Cite

Text

Vinson et al. "Comparative Gene Prediction Using Conditional Random Fields." Neural Information Processing Systems, 2006.

Markdown

[Vinson et al. "Comparative Gene Prediction Using Conditional Random Fields." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/vinson2006neurips-comparative/)

BibTeX

@inproceedings{vinson2006neurips-comparative,
  title     = {{Comparative Gene Prediction Using Conditional Random Fields}},
  author    = {Vinson, Jade P. and Decaprio, David and Pearson, Matthew D. and Luoma, Stacey and Galagan, James E.},
  booktitle = {Neural Information Processing Systems},
  year      = {2006},
  pages     = {1441-1448},
  url       = {https://mlanthology.org/neurips/2006/vinson2006neurips-comparative/}
}