|
Fgenes-m |
Pattern-based prediction of multiple variants of gene structure
There are two reasons to predict several sub-optimal variants of gene structure, instead of only one:
1) Gene prediction algorithms for long genomic sequences are only 70-80% accurate on average,
therefore real gene structure might have the score slightly lower than the predicted optimal variant.
Fgenes-m allows you to see alternative structures that otherwise you might never see;
and
2) Alternative splicing is quite common for mammalian genes, so you may miss
real gene structures relying on just one optimal prediction, even supported by experimental data.
Of course, thousands of alternative gene structures can be predicted, and there is currently no established way to distinguish true variants from false ones.
Fgenes-m variant proved to be useful in providing a set of possible gene structures for further experimental testing in commercial gene hunting.
Algorithm outputs several (up to 15, though the number can be changed) suboptimal variants of predicted gene structure. It is similar to Fgenes and is based on pattern recognition of different types of exons, promoters and polyA signals and finding optimal combination of them by dynamic programming. Then, a set of gene models along given sequences is constructed.
You may compare validities of predicted variants using GENE WEIGHT parameter. If this parameter is similar in alternative variants, it is reasonable to consider them.
FGENES-M 1.5.0 Prediction of several variants of multiple genes Time: 175701.1 Date: 19981005 Seq name: ACU08131 Length of sequence: 5392 GC content: 0.46 Zone: 2 Number of predicted genes: 1 In +chain: 1 In -chain: 0 Number of predicted exons: 6 In +chain: 6 In -chain: 0 Predicted genes and exons in var: 1 Max var= 10 GENE WEIGHT: 24.1 G Str Feature Start End Weight ORF-start ORF-end 1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56 1 + 1 CDSf 521 - 641 1.23 521 - 640 1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361 1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026 1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802 1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797 1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244 1 + PolA 4650 3.17 Predicted proteins: >FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 4247 369 a Ch+ MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY AFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDGSELSSTSRTEVSS VSNSSVSPA FGENES-M 1.5.0 Prediction of several variants of multiple genes Time: 175701.1 Date: 19981005 Seq name: ACU08131 Length of sequence: 5392 GC content: 0.46 Zone: 2 Number of predicted genes: 1 In +chain: 1 In -chain: 0 Number of predicted exons: 6 In +chain: 6 In -chain: 0 Predicted genes and exons in var: 2 Max var= 10 GENE WEIGHT: 15.1 G Str Feature Start End Weight ORF-start ORF-end 1 + 1 CDSf 218 - 321 1.01 218 - 319 1 + 2 CDSi 984 - 1023 1.94 986 - 1021 1 + 3 CDSi 1860 - 2028 1.49 1862 - 2026 1 + 4 CDSi 2675 - 2802 1.00 2676 - 2801 1 + 5 CDSi 3558 - 3797 4.35 3558 - 3797 1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244 1 + PolA 4650 3.17 Predicted proteins: >FGENES-M 1.5 ACU08131 1 Multiexon gene 218 - 4247 265 a Ch+ MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV DDGSELSSTSRTEVSSVSNSSVSPA FGENES-M 1.5.0 Prediction of several variants of multiple genes Time: 175701.1 Date: 19981005 Seq name: ACU08131 Length of sequence: 5392 GC content: 0.46 Zone: 2 Number of predicted genes: 1 In +chain: 1 In -chain: 0 Number of predicted exons: 6 In +chain: 6 In -chain: 0 Predicted genes and exons in var: 3 Max var= 10 GENE WEIGHT: 14.3 G Str Feature Start End Weight ORF-start ORF-end 1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56 1 + 1 CDSf 521 - 641 1.23 521 - 640 1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361 1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026 1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802 1 + 5 CDSi 3558 - 3870 0.78 3558 - 3869 1 + 6 CDSl 4857 - 5131 2.37 4859 - 5128 1 + PolA 5187 0.77 Predicted proteins: >FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 5131 446 a Ch+ MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY AFHPLAAALPAYFAKSATIYNPIIYVFMNRQVIFCVPKWTVTGLARRVQKREGCMVFTGA RECIEGGQEEEKFVPRGVCASAKSNALNLNSVESGHDSDTGRTNETQHDPPRSLQGLCAS SQHGSTGTILYIVFDTKACCVPGTSS FGENES-M 1.5.0 Prediction of several variants of multiple genes Time: 175701.1 Date: 19981005 Seq name: ACU08131 Length of sequence: 5392 GC content: 0.46 Zone: 2 Number of predicted genes: 1 In +chain: 1 In -chain: 0 Number of predicted exons: 6 In +chain: 6 In -chain: 0 Predicted genes and exons in var: 4 Max var= 10 GENE WEIGHT: 13.9 G Str Feature Start End Weight ORF-start ORF-end 1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56 1 + 1 CDSf 521 - 641 1.23 521 - 640 1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361 1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026 1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802 1 + 5 CDSi 3558 - 3668 0.99 3558 - 3668 1 + 6 CDSl 4131 - 4247 2.09 4131 - 4244 1 + PolA 4650 3.17 Predicted proteins: >FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 4247 326 a Ch+ MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTFRNCIMQLFGKK VDDGSELSSTSRTEVSSVSNSSVSPA FGENES-M 1.5.0 Prediction of several variants of multiple genes Time: 175701.1 Date: 19981005 Seq name: ACU08131 Length of sequence: 5392 GC content: 0.46 Zone: 2 Number of predicted genes: 1 In +chain: 1 In -chain: 0 Number of predicted exons: 5 In +chain: 5 In -chain: 0 Predicted genes and exons in var: 5 Max var= 10 GENE WEIGHT: 13.0 G Str Feature Start End Weight ORF-start ORF-end 1 + TSS 355 7.43 TATA 327 wTATA 21.08 LDF 0.56 1 + 1 CDSf 521 - 641 1.23 521 - 640 1 + 2 CDSi 1066 - 1362 2.08 1068 - 1361 1 + 3 CDSi 1860 - 2028 1.69 1862 - 2026 1 + 4 CDSi 2637 - 2802 2.74 2638 - 2802 1 + 5 CDSl 3558 - 3875 2.10 3558 - 3872 1 + PolA 4650 3.17 Predicted proteins: >FGENES-M 1.5 ACU08131 1 Multiexon gene 521 - 3875 356 a Ch+ MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY AFHPLAAALPAYFAKSATIYNPIIYVFMNRQVIFCVPKWTVTGLARRVQKREGCMG