Fgenesh-c

Program for predicting multiple genes in genomic DNA sequences using HMM gene model plus similarity with known mRNA/EST

The program can be used if you know mRNA/EST sequence that is homologous to that of predicted gene. First, run any ab initio gene finding program such as Fgenes or Fgenesh. Then, run BLAST DB search with each predicted exon. If homologous mRNA is found, use it to improve accuracy of assembly of your predicted gene.

Ab initio gene prediction programs usually correctly predict significant fraction of exons in a gene, but they often assemble gene in incorrect way: combine several genes or split one gene into several, skip exons or include false exons. Using mRNA homology information provided by one or several true predicted exons can significantly improve accuracy of gene finding.

Program use and output are similar to those of Fgenesh+:
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct or - for complementary);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
Start and End - Position of the Feature;
Weight - Log likelihood*10 score for the feature ORF - start/end positions where the first complete codon starts and the last codon ends.
Last three values: Length of exon, positions in protein, percent of similarity with target protein

Output example:


 FGENESHc 2.5 Prediction of potential genes in Homo_sapiens genomic DNA
 Time    :   Sun Jan 28 23:16:55 2007
 Seq name: >HUMSFRS_8213_DNA_14-FEB-1996 
 Length of sequence: 6423 
 Homology: Q 
 Length of homolog: 817 
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 8 in +chain 8 in -chain 0
 Positions of predicted genes and exons: Variant   1 from   1, Score:437.471680 
   G Str   Feature   Start        End    Score           ORF           Len

   1 +      TSS         16               -7.39
   1 +    1 CDSf       151 -       178   59.16       151 -       177     27     1     78  100
   1 +    2 CDSi      1213 -      1393  118.23      1215 -      1391    177    79    259  100
   1 +    3 CDSi      1702 -      1878   97.79      1703 -      1876    174   260    436  100
   1 +    4 CDSi      2754 -      2828   40.58      2755 -      2826     72   437    511  100
   1 +    5 CDSi      3250 -      3360   38.73      3251 -      3358    108   512    622  100
   1 +    6 CDSi      4659 -      4712   23.03      4660 -      4710     51   623    676  100
   1 +    7 CDSi      5227 -      5262   24.08      5228 -      5260     33   677    712  100
   1 +    8 CDSl      6219 -      6273   52.07      6220 -      6273     54   713    817  100
   1 +      PolA      6378               -6.78

Predicted protein(s):
>FGENESH:   1   8 exon (s)    151  -   6273   238 aa, chain +
MSRYGRYGGETKVYVGNLGTGAGKGELERAFSYYGPLRTVWIARNPPGFAFVEFEDPRDA
EDAVRGLDGKVICGSRVRVELSTGMPRRSRFDRPPARRPFDPNDRCYECGEKGHYAYDCH
RYSRRRRSRSRSRSHSRSRGRRYSRSRSRSRGRRSRSASPRRSRSISLRRSRSASLRRSR
SGSIKGSRYFQSPSRSRSRSRSISRPRSSRSKSRSPSPKRSRSPSGSPRRSASPERMD