NNSSP

Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments

Input sequence for this program should be in fasta format with 80 or less sequence letters per line.

Method description:

Yi and Lander (*) developed a neural-network and nearest-neighbor method with a scoring system that combined a sequence similarity matrix with the local structural environment scoring scheme of Bowie et al.(**) for predicting protein secondary structure. We have improved their scoring system by taking into consideration N- and C-terminal positions of a-helices and b-strands and also b-turns as distinctive types of secondary structure. Another improvement, which also significantly decrease the time of computation, is performed by restricting a data base with a smaller subset of proteins which are similar with a query sequence. Using multiple sequence alignments rather than single sequences and a simple jury decision method we achieved an over all three-state accuracy of 72.2%, which is better than that observed for the most accurate multilayered neural network approach, tested on the same data set of 126 non-homologous protein chains.

(*) Yi T-M., Lander E.S. (1993)
Protein secondary structure prediction using nearest-neighbor methods.
J.Mol.Biol.,232:1117-1129.

(**) Bowie J.U., Luthy R., Eisenberg D. (1991)
A method to identify protein sequences that fold into a known three-dimensional structure.
Science, 253, 164-170.)

Accuracy:
Overall 3-states (a, b, c) prediction gives ~67.6% correctly predic- ted residues on 126 non-homologous proteins using the jack-knife test procedure. Using multiple sequence alignments instead of single sequences increases prediction accuracy up to 72.2%.

SEE ALSO "SSP" program.

Example of NNssp output: This output contains probabilities (Pa and Pb) of a and b structures in 0-9 scale. Probability of c is approximately 10 - Pa - Pb.


ADENYLATE KINASE ISOENZYME-3, /GTP:AMP$
 L=  214 SS content: a-  0.43 b=  0.05 c=  0.52
                    10        20        30        40        50
 PredSS     aaaaaaa           aaaaaa         aaaaaaaa       aa
 AA seq     RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
 Prob a     99888651000001112244545422211111346775554221332335
 Prob b     00001221000001134422321222233221001110010101134443
                    60        70        80        90       100
 PredSS     aaaa        aaaaaaaaaaaaaaaa             aaaaaaaaa
 AA seq     KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
 Prob a     54543201110346789888877545553334210001113588888875
 Prob b     22221001210001111000000000111233410101110000000011
                   110       120       130       140       150
 PredSS         bb     aaaaaaaa   bb      bbbb
 AA seq     QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
 Prob a     32111111111466766643321110001100000000000111111111
 Prob b     12135643321222110122245531001478764210013333211101
                   160       170       180       190       200
 PredSS               aaaaaaaaaaaaaaaaaaaaaaa   bbb          a
 AA seq     PLVQREDDRPETVVKRLKAYEAQTEPVLEYYRKKGVLETFSGTETNKIWP
 Prob a     23433211146788999997765577888886621121111111123335
 Prob b     12321000001110000000000000000000101365542111111221
                   210
 PredSS     aaaaaaa
 AA seq     HVYAFLQTKLPQRS
 Prob a     46687764210111
 Prob b     22211110110001

Reference:
Salamov A.A., Solovyev V.V.
Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments.
J.Mol.Biol.,1995, 247, 11-15.