Fgenes

Pattern based human gene structure prediction (multiple genes, both chains).

Method description:
Algorithm based on pattern recognition of different types of exons, promoters and polyA signals. Optimal combination of these features is then found by dynamic programming and a set of gene models is constructed along a given sequence.

Fgenes output:
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct and - for complementary strands);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
TSS - position of transcription start;
TATA – position of TATA-box;
wTATA – Discriminant function score for TATA box;
Start and End - Positions of the Feature;
Weight - Discriminant function score for the feature;
ORF - start/end positions of ORF where the first complete codon starts and the last codon ends.


 FGENES 1.5 Prediction of multiple genes in genomic DNA
 Time: 171940.7 Date: 20001003       
 Seq name: > HUMHBB      73308 bp    DNA             PRI       20-JAN-1
 Length of sequence:   73308 GC content: 0.39 Zone: 1
 Number of predicted genes:   9 In +chain:   7 In -chain:   2
 Number of predicted exons:  23 In +chain:  19 In -chain:   4
 Positions of predicted genes and exons:
  G Str Feature  Start       End   Weight  ORF-start ORF-end
     
  1 -   1 CDSi    5978 -    6039    1.69    5978 -    6037
  1 -   2 CDSf    6314 -    6365    1.40    6315 -    6365
 
  2 -   1 CDSl   13709 -   13807    1.84   13712 -   13807
  2 -   2 CDSf   14781 -   14855    1.62   14781 -   14855
 
  3 +     TSS    19488              5.83 TATA  19457 wTATA   19.85 LDF   0.81
  3 +   1 CDSf   19541 -   19632   11.08   19541 -   19630
  3 +   2 CDSi   19755 -   19977    6.20   19756 -   19977
  3 +   3 CDSl   20833 -   20961    5.95   20833 -   20958
  3 +     PolA   21055              2.08
 
  4 +     TSS    34478              4.98 TATA  34447 wTATA   19.21 LDF   0.91
  4 +   1 CDSf   34531 -   34622    8.82   34531 -   34620
  4 +   2 CDSi   34745 -   34967    5.96   34746 -   34967
  4 +   3 CDSl   35854 -   35982    6.30   35854 -   35979
  4 +     PolA   36043              2.68
 
  5 +     TSS    39412              5.00 TATA  39383 wTATA   19.21 LDF   0.93
  5 +   1 CDSf   39467 -   39558    8.82   39467 -   39556
  5 +   2 CDSi   39681 -   39903    5.96   39682 -   39903
  5 +   3 CDSl   40770 -   40898    6.17   40770 -   40895
  5 +     PolA   40959              2.78
 
  6 +   1 CDSf   45995 -   46151    3.09   45995 -   46150
  6 +   2 CDSl   46997 -   47100    2.32   46999 -   47097
  6 +     PolA   47243              2.75
 
  7 +   1 CDSf   54790 -   54881    8.97   54790 -   54879
  7 +   2 CDSi   55010 -   55232    5.60   55011 -   55232
  7 +   3 CDSl   56131 -   56259    5.05   56131 -   56256
  7 +     PolA   56365              1.07
 
  8 +   1 CDSf   62187 -   62278    9.72   62187 -   62276
  8 +   2 CDSi   62409 -   62631    6.64   62410 -   62631
  8 +   3 CDSl   63482 -   63610    6.56   63482 -   63607
  8 +     PolA   63718              4.72
 
  9 +   1 CDSf   68183 -   68290    2.50   68183 -   68290
  9 +   2 CDSl   70703 -   70819    1.10   70703 -   70816
  9 +     PolA   70905              4.71
 
Predicted proteins:
>FGENES 1.5 > HUMHBB      7   1 Multiexon gene    5978 -    6365      38 a Ch-
MVCNCGLDHNFQSPRSKTCAFNKLIYTTSTLGSSSINE
>FGENES 1.5 > HUMHBB      7   2 Multiexon gene   13709 -   14855      57 a Ch-
MCSHHLASNCCFRSVPLPHLSRSLQEFVLKVNFHNRKLIEAKASVKERNISSKPLCC
>FGENES 1.5 > HUMHBB      7   3 Multiexon gene   19541 -   20961     147 a Ch+
MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK
VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFG
KEFTPEVQAAWQKLVSAVAIALAHKYH
>FGENES 1.5 > HUMHBB      7   4 Multiexon gene   34531 -   35982     147 a Ch+
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
KEFTPEVQASWQKMVTGVASALSSRYH
>FGENES 1.5 > HUMHBB      7   5 Multiexon gene   39467 -   40898     147 a Ch+
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
KEFTPEVQASWQKMVTAVASALSSRYH
>FGENES 1.5 > HUMHBB      7   6 Multiexon gene   45995 -   47100      86 a Ch+
MGNPKVKAHGKKVLISFGKAVMLTDDLKGTFATLSDLHCNKLHVDPENFLVSTLRQRDID
CFGNPLQRGFYPTDTGFLAVTNKCCG
>FGENES 1.5 > HUMHBB      7   7 Multiexon gene   54790 -   56259     147 a Ch+
MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPK
VKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFG
KEFTPQMQAAYQKVVAGVANALAHKYH
>FGENES 1.5 > HUMHBB      7   8 Multiexon gene   62187 -   63610     147 a Ch+
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK
VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVANALAHKYH
>FGENES 1.5 > HUMHBB      7   9 Multiexon gene   68183 -   70819      74 a Ch+
MEQSWAENDFDELREEGFRRSNYSKLKEEVRTNGKEASIILIPKPDRDTTKKENVTPISL
MNIDAKILNKILAN