Fgenes-m

Pattern-based prediction of multiple variants of gene structure

There are two reasons to predict several sub-optimal variants of gene structure, instead of only one:
1) Gene prediction algorithms for long genomic sequences are only 70-80% accurate on average, therefore real gene structure might have the score slightly lower than the predicted optimal variant. Fgenes-m allows you to see alternative structures that otherwise you might never see;
and
2) Alternative splicing is quite common for mammalian genes, so you may miss real gene structures relying on just one optimal prediction, even supported by experimental data.

Of course, thousands of alternative gene structures can be predicted, and there is currently no established way to distinguish true variants from false ones.

Fgenes-m variant proved to be useful in providing a set of possible gene structures for further experimental testing in commercial gene hunting.

Method description:

Algorithm outputs several (up to 15, though the number can be changed) suboptimal variants of predicted gene structure. It is similar to Fgenes and is based on pattern recognition of different types of exons, promoters and polyA signals and finding optimal combination of them by dynamic programming. Then, a set of gene models along given sequences is constructed.

You may compare validities of predicted variants using GENE WEIGHT parameter. If this parameter is similar in alternative variants, it is reasonable to consider them.

Fgenes-M output:


 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 175701.1 Date: 19981005       
 Seq name:  ACU08131                                                   
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   1 Max var=   10 GENE WEIGHT:   24.1
  G Str Feature  Start       End   Weight  ORF-start ORF-end
     
  1 +     TSS      355              7.43 TATA    327 wTATA   21.08 LDF   0.56
  1 +   1 CDSf     521 -     641    1.23     521 -     640
  1 +   2 CDSi    1066 -    1362    2.08    1068 -    1361
  1 +   3 CDSi    1860 -    2028    1.69    1862 -    2026
  1 +   4 CDSi    2637 -    2802    2.74    2638 -    2802
  1 +   5 CDSi    3558 -    3797    4.35    3558 -    3797
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17
 
Predicted proteins:
>FGENES-M 1.5  ACU08131         1 Multiexon gene     521 -    4247     369 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDGSELSSTSRTEVSS
VSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 175701.1 Date: 19981005       
 Seq name:  ACU08131                                                   
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   2 Max var=   10 GENE WEIGHT:   15.1
  G Str Feature  Start       End   Weight  ORF-start ORF-end
     
  1 +   1 CDSf     218 -     321    1.01     218 -     319
  1 +   2 CDSi     984 -    1023    1.94     986 -    1021
  1 +   3 CDSi    1860 -    2028    1.49    1862 -    2026
  1 +   4 CDSi    2675 -    2802    1.00    2676 -    2801
  1 +   5 CDSi    3558 -    3797    4.35    3558 -    3797
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17
 
Predicted proteins:
>FGENES-M 1.5  ACU08131         1 Multiexon gene     218 -    4247     265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 175701.1 Date: 19981005       
 Seq name:  ACU08131                                                   
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   3 Max var=   10 GENE WEIGHT:   14.3
  G Str Feature  Start       End   Weight  ORF-start ORF-end
     
  1 +     TSS      355              7.43 TATA    327 wTATA   21.08 LDF   0.56
  1 +   1 CDSf     521 -     641    1.23     521 -     640
  1 +   2 CDSi    1066 -    1362    2.08    1068 -    1361
  1 +   3 CDSi    1860 -    2028    1.69    1862 -    2026
  1 +   4 CDSi    2637 -    2802    2.74    2638 -    2802
  1 +   5 CDSi    3558 -    3870    0.78    3558 -    3869
  1 +   6 CDSl    4857 -    5131    2.37    4859 -    5128
  1 +     PolA    5187              0.77
 
Predicted proteins:
>FGENES-M 1.5  ACU08131         1 Multiexon gene     521 -    5131     446 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQVIFCVPKWTVTGLARRVQKREGCMVFTGA
RECIEGGQEEEKFVPRGVCASAKSNALNLNSVESGHDSDTGRTNETQHDPPRSLQGLCAS
SQHGSTGTILYIVFDTKACCVPGTSS
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 175701.1 Date: 19981005       
 Seq name:  ACU08131                                                   
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   4 Max var=   10 GENE WEIGHT:   13.9
  G Str Feature  Start       End   Weight  ORF-start ORF-end
     
  1 +     TSS      355              7.43 TATA    327 wTATA   21.08 LDF   0.56
  1 +   1 CDSf     521 -     641    1.23     521 -     640
  1 +   2 CDSi    1066 -    1362    2.08    1068 -    1361
  1 +   3 CDSi    1860 -    2028    1.69    1862 -    2026
  1 +   4 CDSi    2637 -    2802    2.74    2638 -    2802
  1 +   5 CDSi    3558 -    3668    0.99    3558 -    3668
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17
 
Predicted proteins:
>FGENES-M 1.5  ACU08131         1 Multiexon gene     521 -    4247     326 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTFRNCIMQLFGKK
VDDGSELSSTSRTEVSSVSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 175701.1 Date: 19981005       
 Seq name:  ACU08131                                                   
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   5 In +chain:   5 In -chain:   0
 Predicted genes and exons in var:   5 Max var=   10 GENE WEIGHT:   13.0
  G Str Feature  Start       End   Weight  ORF-start ORF-end
     
  1 +     TSS      355              7.43 TATA    327 wTATA   21.08 LDF   0.56
  1 +   1 CDSf     521 -     641    1.23     521 -     640
  1 +   2 CDSi    1066 -    1362    2.08    1068 -    1361
  1 +   3 CDSi    1860 -    2028    1.69    1862 -    2026
  1 +   4 CDSi    2637 -    2802    2.74    2638 -    2802
  1 +   5 CDSl    3558 -    3875    2.10    3558 -    3872
  1 +     PolA    4650              3.17
 
Predicted proteins:
>FGENES-M 1.5  ACU08131         1 Multiexon gene     521 -    3875     356 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQVIFCVPKWTVTGLARRVQKREGCMG