|
SSP |
Prediction of a-helix and b-strand segments of globular proteins
Our segment-oriented method is designed to locate secondary structure elements and uses linear discriminant analysis to assign segments of a given amino acid sequence to a particular type of secondary structure, by taking into account the amino acid composition of internal parts of segments as well as their terminal and adjacent regions. Four linear discriminant functions were constructed for recognition of short and long a-helix and b-strand segments, respectively. These functions combine 3 characteristics: hydrophobic moment, segment singlet and pair preferences to an a-helix or b-strand. To improve the prediction accuracy of the method, a simple version which treats multiple sequence alignments that are used as input in place of single sequences has been developed.
Overall 3-states (a, b, c) prediction gives ~65.1% correctly predicted residues on 126 non-homologous proteins using the jack-knife test procedure (The accuracy is good if you have no homologous sequences to apply Sander et al. method (Rost,Sander, Mol.Biol,1993,232,584-599) that has about 71% accuracy with using these sequences and about 61% without them). Analysis of the prediction results shows high prediction accuracy of long secondary structure segments (~89% of a- helices of lengths greater than 8 and ~71% of b-strands of lengths greater than 6 are located with probability of correct prediction 0.82 and 0.78 respectively). Using mean values of discriminant functions over the aligned sequences of homologous proteins, we achieved a prediction accuracy of 68.2%. Our variant of nearest-neighbor algorithm with using multiply sequence alignments of homologous proteins has 72% accuracy and 67.6% accuracy without homologous proteins.
SEE ALSO NNSSP program.
Loading File Format:
(a) For single sequence you must load file in the following format:
First Line - Sequence name,
Second line - number 1 in format I5,
Third and subsequent lines - amino acid sequence.
Sequence length must be less than 2000 amino acids! Restrict the line length to 75 characters. You can use small letters for Cys bridges, if you want.
Example:
ADENYLATE KINASE 1 RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE PLVQREDDRPETVVK............
(b) For multiple aligned sequences:
First Line - Sequence name,
Second line - number of aligned sequences and length of protein,
Third line - empty or numbers of aligned aminoacid sequence,
Subsequent lines - aligned amino acid sequences in format 60a1.
Parts of aligned sequences must be separated by empty line or line with numbers. The number of aligned sequences must be less than 250. Alignment MUST be without gaps in the first (query) sequence!
Example:
ACTINOXANTHIN 5 107 10 20 30 40 50 60 APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG 70 80 90 100 FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-
Example of SSP output:
ADENYLATE KINASE 10 20 30 40 50 pred A: aaaaaaaaa aaaaaaaaa aaaaaaaaa aaa AA N 4.1 C N 2.2 C N 4.4 C N pred B: bbbb BB N2 C Predic aaaaaaaaa bbbb aaaaaaaaa aaaaaaaaa aaa a/acid RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA 60 70 80 90 100 pred A: aaaaaa aaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa AA 2.2 C N 4.2 CN 2.4 C N 5.4 C pred B: bbbbbbb BB N 2.6 C Predic aaaaaa aaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa a/acid KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
The output of the prediction program presents not only final optimal variant of the secondary structure assignment, but also a set of potential a-helix and b-strand segments that were computed without consideration of their competition. Because the protein secondary structure is finally stabilized during the formation of the tertiary structure, the alternative variants of the a-helix and b-strand segments may be important for methods of tertiary structure prediction.
References:
Solovyev V.V.,Salamov A.A. Method of calculation of discrete secondary structures in globular proteins. Molek. Biol. 25:810-824,1991 (in Russ.)
Solovyev V.V.,Salamov A.A. 1994,Secondary structure prediction based on discriminant analysis. In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim H.A.), World Scientific, p.352-364.
Solovyev V.V., Salamov A.A. Predicting a-helix and b-strand segments of globular proteins. CABIOS (1994), V.10,6,661-669