|
Fgenesh-c |
Program for predicting multiple genes in genomic DNA sequences using HMM gene model plus similarity with known mRNA/EST
The program can be used if you know mRNA/EST sequence that is homologous to that of predicted gene. First, run any ab initio gene finding program such as Fgenes or Fgenesh. Then, run BLAST DB search with each predicted exon. If homologous mRNA is found, use it to improve accuracy of assembly of your predicted gene.
Ab initio gene prediction programs usually correctly predict significant fraction of exons in a gene, but they often assemble gene in incorrect way: combine several genes or split one gene into several, skip exons or include false exons. Using mRNA homology information provided by one or several true predicted exons can significantly improve accuracy of gene finding.
Program use and output are similar to those of Fgenesh+:
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct or - for complementary);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
Start and End - Position of the Feature;
Weight - Log likelihood*10 score for the feature ORF - start/end positions where the first complete codon starts
and the last codon ends.
Last three values: Length of exon, positions in protein, percent of similarity with target protein
Output example:
FGENESHc 2.5 Prediction of potential genes in Homo_sapiens genomic DNA Time : Sun Jan 28 23:16:55 2007 Seq name: >HUMSFRS_8213_DNA_14-FEB-1996 Length of sequence: 6423 Homology: Q Length of homolog: 817 Number of predicted genes 1 in +chain 1 in -chain 0 Number of predicted exons 8 in +chain 8 in -chain 0 Positions of predicted genes and exons: Variant 1 from 1, Score:437.471680 G Str Feature Start End Score ORF Len 1 + TSS 16 -7.39 1 + 1 CDSf 151 - 178 59.16 151 - 177 27 1 78 100 1 + 2 CDSi 1213 - 1393 118.23 1215 - 1391 177 79 259 100 1 + 3 CDSi 1702 - 1878 97.79 1703 - 1876 174 260 436 100 1 + 4 CDSi 2754 - 2828 40.58 2755 - 2826 72 437 511 100 1 + 5 CDSi 3250 - 3360 38.73 3251 - 3358 108 512 622 100 1 + 6 CDSi 4659 - 4712 23.03 4660 - 4710 51 623 676 100 1 + 7 CDSi 5227 - 5262 24.08 5228 - 5260 33 677 712 100 1 + 8 CDSl 6219 - 6273 52.07 6220 - 6273 54 713 817 100 1 + PolA 6378 -6.78 Predicted protein(s): >FGENESH: 1 8 exon (s) 151 - 6273 238 aa, chain + MSRYGRYGGETKVYVGNLGTGAGKGELERAFSYYGPLRTVWIARNPPGFAFVEFEDPRDA EDAVRGLDGKVICGSRVRVELSTGMPRRSRFDRPPARRPFDPNDRCYECGEKGHYAYDCH RYSRRRRSRSRSRSHSRSRGRRYSRSRSRSRGRRSRSASPRRSRSISLRRSRSASLRRSR SGSIKGSRYFQSPSRSRSRSRSISRPRSSRSKSRSPSPKRSRSPSGSPRRSASPERMD