SSP

Prediction of a-helix and b-strand segments of globular proteins

Method description:

Our segment-oriented method is designed to locate secondary structure elements and uses linear discriminant analysis to assign segments of a given amino acid sequence to a particular type of secondary structure, by taking into account the amino acid composition of internal parts of segments as well as their terminal and adjacent regions. Four linear discriminant functions were constructed for recognition of short and long a-helix and b-strand segments, respectively. These functions combine 3 characteristics: hydrophobic moment, segment singlet and pair preferences to an a-helix or b-strand. To improve the prediction accuracy of the method, a simple version which treats multiple sequence alignments that are used as input in place of single sequences has been developed.

Accuracy:

Overall 3-states (a, b, c) prediction gives ~65.1% correctly predicted residues on 126 non-homologous proteins using the jack-knife test procedure (The accuracy is good if you have no homologous sequences to apply Sander et al. method (Rost,Sander, Mol.Biol,1993,232,584-599) that has about 71% accuracy with using these sequences and about 61% without them). Analysis of the prediction results shows high prediction accuracy of long secondary structure segments (~89% of a- helices of lengths greater than 8 and ~71% of b-strands of lengths greater than 6 are located with probability of correct prediction 0.82 and 0.78 respectively). Using mean values of discriminant functions over the aligned sequences of homologous proteins, we achieved a prediction accuracy of 68.2%. Our variant of nearest-neighbor algorithm with using multiply sequence alignments of homologous proteins has 72% accuracy and 67.6% accuracy without homologous proteins.

SEE ALSO NNSSP program.

Loading File Format:

(a) For single sequence you must load file in the following format:
First Line - Sequence name,
Second line - number 1 in format I5,
Third and subsequent lines - amino acid sequence.

Sequence length must be less than 2000 amino acids! Restrict the line length to 75 characters. You can use small letters for Cys bridges, if you want.

Example:


ADENYLATE KINASE 
    1 
RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA 
KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY 
QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE 
PLVQREDDRPETVVK............ 

(b) For multiple aligned sequences:
First Line - Sequence name,
Second line - number of aligned sequences and length of protein,
Third line - empty or numbers of aligned aminoacid sequence,
Subsequent lines - aligned amino acid sequences in format 60a1.

Parts of aligned sequences must be separated by empty line or line with numbers. The number of aligned sequences must be less than 250. Alignment MUST be without gaps in the first (query) sequence!

Example:


ACTINOXANTHIN 
5 107 
        10        20        30        40        50        60
APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS 
APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS 
APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS 
APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT 
ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG
        70        80        90       100 
FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF 
FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF 

TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF
AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF 
AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-

Example of SSP output:


   ADENYLATE KINASE     
                    10        20        30        40        50
   pred A:    aaaaaaaaa          aaaaaaaaa     aaaaaaaaa     aaa
   AA         N  4.1  C          N  2.2  C     N  4.4  C     N  
   pred B:                  bbbb                                
   BB                       N2 C                                
   Predic     aaaaaaaaa     bbbb aaaaaaaaa     aaaaaaaaa     aaa
   a/acid     RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
                    60        70        80        90       100
   pred A:    aaaaaa       aaaaaaaaaaaaaaaaaaaaaaa     aaaaaaaaa
   AA         2.2  C       N    4.2    CN   2.4  C     N  5.4  C
   pred B:                 bbbbbbb                              
   BB                      N 2.6 C                              
   Predic     aaaaaa       aaaaaaaaaaaaaaaaaaaaaaa     aaaaaaaaa
   a/acid     KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY

The output of the prediction program presents not only final optimal variant of the secondary structure assignment, but also a set of potential a-helix and b-strand segments that were computed without consideration of their competition. Because the protein secondary structure is finally stabilized during the formation of the tertiary structure, the alternative variants of the a-helix and b-strand segments may be important for methods of tertiary structure prediction.

References:

Solovyev V.V.,Salamov A.A. Method of calculation of discrete secondary structures in globular proteins. Molek. Biol. 25:810-824,1991 (in Russ.)

Solovyev V.V.,Salamov A.A. 1994,

Secondary structure prediction based on discriminant analysis. In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim H.A.), World Scientific, p.352-364.

Solovyev V.V., Salamov A.A. Predicting a-helix and b-strand segments of globular proteins. CABIOS (1994), V.10,6,661-669