|
BestORF |
Prediction of potential coding fragments in EST/mRNA sequence.
Algorithm is based on Markov chain model of coding regions and a probabilistic model to combine it with Start codon potential.
Our tests show that accuracy of frame recognition (true ORF) is about 100% for typical mRNA and about 99% for mRNA fragments of 500 - 800 bp containing partial coding region. Accuracy is lower for EST with frameshift errors, or for EST with very short coding fragments.
The program outputs potential CDS positions produced taking into account probabilities of each potential start codon, as well as longest ORF positions, as an extension of CDS upstream from start codon). If all observed Met codons are recognized as internal, i.e. when predicted translation start codon is missing from the sequence, CDS and ORF have the same positions.
Example of Output:
BestORF Prediction of potential coding fragment in plant EST/mRNA sequence Time: Tue Feb 16 20:03:57 1999. Seq name: Seq_name: Length of sequence: 388 Predicted CDS 1 in +chain 1 in -chain 0 Position of predicted CDS/ORF: G Str Feature Start End Score ORF CDS-Len Frame 1 + 1 CDSo 30 - 386 30.57 3 - 386 357 +3 Predicted protein fragment: >BestORF 1 1 fragment (s) 30 - 386 119 aa, chain + MDELDILIVGGYWGKGSRGGMMSHFLCAVAEKPPPGEKPSVFHTLSRVGSGCTMKELYDL GLKLAKYWKPFHRKAPPSSILCGTEKPEVYIEPCNSVIVQIKAAEIVPSDMYKTGCTLR
Abbreviations: G - gene (CDS/ORF), Str - Strand, CDS-Len - CDS Length.