CysRec

The program performs prediction of SS-bonding states of cysteines and locating of disulphide bridges in proteins.

Methodology

Procedure: The sequence is processed in steps.

  1. Secondary structure is predicted for a query sequence.
  2. Amino acid fragment as well as fragment of secondary structure in ±10 positions interval of each cysteine is compared with such fragments of training sets using prepared log-odds matrix, and the maximal score is defined for each set.
  3. Scores of comparisons with profiles (weight matrices) constructed on positive (bounded) and negative examples are calculated for a given fragment.
  4. Value of linear discriminant function is calculated based on 4 the most significant amino acid properties.
  5. The resulting score computed as a linear combination of five scores listed above is used for the recognition of SS-bonding states of cysteines.
  6. A neural network calculates some scores for each possible pair of cisteines forming a 'Matrix of pair scores'.
  7. A pattern of possible pairs of bounded cysteines is defined for maximum of sum of the scores of the matrix.

Input Format

Fasta formatted sequence divided by lines ≤ 80 positions in lengths is accepted.

Specially prepared alignment without gaps in the first sequence is accepted too.


Example of alignment:


T0129
    5  182

MLISHSDLNQQLKSAGIGFNATELHGFLSGLLCGGLKDQSWLPLLYQFSN
---SYSDFSQQLKTAGIALSAAELHGFLTGLICGGIHDQSWQPLLFQFTN
-LPTYPSLALALSQQAVALTPAEMHGLISGMLCGGSKDNGWQTLVHDLTN
----YDEMNRFLNQQGAGLTPAEMHGLISGMICGGNNDSSWQPLLHDLTN
----YNEMNQYLNQQGTGLTPAEMHGLISGMICGGNDDSSWLPLLHDLTN

DNHAYPTGLVQPVTELYEQISQTLSDVEGFTFELGLTEDENVFTQADSLS
ENHAYPTALLQEVTQIQQHISKKLADIDGFDFELWLPENEDVFTRADALS
EGVAFPQALSLPLQQLHEATQEALEN-EGFMFQLLIPEGEDVFDRADALS
EGLAFGHELAQALRKMHAATSDALED-DGFLFQLYLPEDVSVFDRADALA
EGMAFGHELAQALRKMHSATSDALQD-DGFLFQLYLPDDVSVFDRADALA

DWANQFLLGIGLAQPELAKEKGEIGEAVDDLQDICQLGYDEDDNEEELAE
EWTNHFLLGLGLAQPKLDKEKGDIGEAIDDLHDICQLGYDESDDKEELSE
GWVNHFLLGLGMLQPKLAQVKDEVGEAIDDLRNIAQLGYDEDEDQEELAQ
GWVNHFLLGLGVTQPKLDKVTGETGEAIDDLRNIAQLGYDESEDQEELEM
GWVNHFLLGLGVTQPKLDKVTGETGEAIDDLRNIAQLGYDEDEDQEELEM

ALEEIIEYVRTIAMLFYSHFNEGEIESKPVLH
ALEEIIEYVRTLACLLFTHFQPQLPEQKPVLH
SLEEVVEYVRVAAILCHIEFTQQKPTAKPTLH
SLEEIIEYVRVAALLCHDTFTRQQPTAKPTLH
SLEEIIEYVRVAALLCHDTFTHPQPTAKPTLH

Output Format

Query sequence

Positions of cysteines which are predicted to form disulfide bonds, matrix of pair scores results of SS-bonding states predictions, the most probable pattern of pairs.


Example of output:



CYS_REC  Version 2. Recognition of SS-bounded cysteines

>1AC5_
 length=483
LPSSEEYKVAYELLPGLSEVPDPSNIPQMHAGHIPLRSEDADEQDSSDLEYFFWKFTNNDSNGNVDRPLIIWLNGGPGCSS
MDGALVESGPFRVNSDGKLYLNEGSWISKGDLLFIDQPTGTGFSVEQNKDEGKIDKNKFDEDLEDVTKHFMDFLENYFKIF
PEDLTRKIILSGESYAGQYIPFFANAILNHNKFSKIDGDTYDLKALLIGNGWIDPNTQSLSYLPFAMEKKLIDESNPNFKH
LTNAHENCQNLINSASTDEAAHFSYQECENILNLLLSYTRESSQKGTADCLNMYNFNLKDSYPSCGMNWPKDISFVSKFFS
TPGVIDSLHLDSDKIDHWKECTNSVGTKLSNPISKPSIHLLPGLLESGIEIVLFNGDKDLICNNKGVLDTIDNLKWGGIKG
FSDDAVSFDWIHKSKSTDDSEEFSGYVKYDRNLTFVSVYNASHMVPFDKSLVSRGIVDIYSNDVMIIDNNGKNVMITT

7 cysteines are found in positions:   79  251  271  293  308  345  386


Matrix of pair scores
 POS:    79   251   271   293   308   345
   79: -999   -21    -4     8    18   143
  251:  -21  -999   155     7    -3   -12
  271:   -4   155  -999    13   -20   -15
  293:    8     7    13  -999   133    -8
  308:   18    -3   -20   133  -999    -7
  345:  143   -12   -15    -8    -7  -999
CYS     79 is SS-bounded               Score=  56.7
CYS    251 is SS-bounded               Score=  53.2
CYS    271 is SS-bounded               Score=  47.0
CYS    293 is SS-bounded               Score=  68.1
CYS    308 is SS-bounded               Score=  63.9
CYS    345 is SS-bounded               Score=  60.7
CYS    386 is not SS-bounded           Score= -70.7

The most probable pattern of pairs: 79-345, 251-271, 293-308,

Performance: 3000 positive and 3000 negative examples (i.e ± 10 fragments surrounding bounded and not bounded cysteines) were prepared from PDB sequences that were not participated in the training. An accuracy of SS-bonding states recognition by combined function on this control set was ~90%.