SeqMatchNW-P

The program implements Needleman-Wunsch algorithm to produce a global alignment of two protein sequences. The approach is described in "A general method applicable to the search for similarities in the amino acid sequence of two proteins", J Mol Biol. 48(3):443-53. The Needleman-Wunsch algorithm uses dynamic programming, and is guaranteed to find the alignment with the maximum score with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme.

Program is provided with viewer.

Example of output:


L:153        Sequence MYOGLOBIN MAP TURTLE 
vs. 19 Base sequences [C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\SeqMatchNW-P\seq1.set.fa].
Total 19 sequences produce 19 significant alignment(s).

[DD]       7, S:      28.714, L:      153 MYOGLOBIN CHICKEN
[DD]      17, S:       27.56, L:      153  MYOGLOBIN HUMAN
[DD]       9, S:      27.482, L:      153 MYOGLOBIN N.AMERICAN OPOSSUM
[DD]       5, S:      26.354, L:      153 MYOGLOBIN SADDLEBACK DOLPHIN
[DD]       8, S:      12.825, L:      146 HEMOGLOBIN BETA CHICKEN
[DD]      13, S:      12.696, L:      141 HEMOGLOBIN ALPHA NILE CROCODILE
[DD]      10, S:      12.388, L:      146 HEMOGLOBIN BETA N.AMERICAN OPOSSUM
[DD]       6, S:      12.271, L:      140 HEMOGLOBIN BETA EDIBLE FROG
[DD]      19, S:      12.226, L:      146  HEMOGLOBIN BETA HUMAN
[DD]      11, S:      11.998, L:      141 HEMOGLOBIN ALPHA BULLFROG
[DD]      14, S:      11.864, L:      141 HEMOGLOBIN ALPHA OSTRICH
[DD]      12, S:      11.533, L:      146 HEMOGLOBIN BETA NILE CROCODILE
[DD]      15, S:      11.521, L:      141   HEMOGLOBIN ALPHA EASTERN GRAY KANGAROO
[DD]      18, S:      11.401, L:      141  HEMOGLOBIN ALPHA HUMAN
[DD]      16, S:      11.095, L:      142 HEMOGLOBIN ALPHA ABYSSINIAN HYRAX
[DD]       2, S:      9.9819, L:      161 HEMOGLOBIN I.PARASPONIA ANDERSONII
[DD]       1, S:      9.4062, L:      146 HEMOGLOBIN VITREOSCILLA SP.
[DD]       3, S:      8.1196, L:      153 LEGHEMOGLOBIN I. YELLOW LUPIN
[DD]       4, S:      6.8096, L:      143 LEGHEMOGLOBIN I.BROAD BEAN .
****************************************************************************
[DD] Sequence:       7(      1), S:      28.714, L:      153 MYOGLOBIN CHICKEN
Summ of block lengths: 153, Alignment bounds:
On first  sequence: start         1, end       153, length 153
On second sequence: start         1, end       153, length 153
Block of alignment: 1        
    1 P:         1         1 L:     153, G:  84.27, W: 874000, S:28.7142
        1 GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEE
          ||||2||44||0||2|||1|552||4||55|||40||||05||0|||1|||05|662||5
        1 GLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPNEMKGSED

       61 VKKHGTTVLTALGRILKLKNNHEPELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHP
          4||||2||||1||6|||0|12||15|||||65|||||||||||||||1|7|7|||||||1
       61 LKKHGATVLTQLGKILKQKGQHESDLKPLAQTHATKHKIPVKYLEFISEVIIKVIAEKHA

      121 SDFGADSQAAMRKALELFRNDMASKYKEFGFQG
          5||||||||||6|||||||||||||||||||||
      121 ADFGADSQAAMKKALELFRNDMASKYKEFGFQG
[DD] Sequence:      17(      1), S:       27.56, L:      153  MYOGLOBIN HUMAN
Summ of block lengths: 153, Alignment bounds:
On first  sequence: start         1, end       153, length 153
On second sequence: start         1, end       153, length 153
Block of alignment: 1        
    1 P:         1         1 L:     153, G:  81.13, W: 830000, S:27.5604
        1 GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEE
          ||||0||40||17|2|||1|512|||||5||||50||||0|6|0|||4||50||665||5
        1 GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASED

       61 VKKHGTTVLTALGRILKLKNNHEPELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHP
          4||||2|||||||0|||0|14||1|5||||6||||||||||||||||1|0|75|512|||
       61 LKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHP

      121 SDFGADSQAAMRKALELFRNDMASKYKEFGFQG
          2|||||5|2||1|||||||2||||2|||4||||
      121 GDFGADAQGAMNKALELFRKDMASNYKELGFQG

....

Where:

1-st line is the header:


[DD] Sequence:       7(      1), S:      28.714, L:      153 MYOGLOBIN CHICKEN
[DD] No sence, used for output compatibility on nucleotide sequence alignment.
Sequence: 7(    1) Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4(      5) - the fifth alignment of the fourth sequence from a set.
S Score of this alignment.
L Length of this query sequence
MYOGLOBIN CHICKEN Name of this query sequence

Additional information about alignment:


Summ of block lengths: 153, Alignment bounds:
On first  sequence: start         1, end       153, length 153
On second sequence: start         1, end       153, length 153
length The length covered by alignment, in target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 1        
    1 P:         1         1 L:     153, G:  81.13, W: 830000, S:27.5604

Block of alignment: 1 - amount of blocks. Below each line corresponds to one block:


    1 P:         1         1 L:     153, G:  81.13, W: 830000, S:27.5604	

1 Block number.
P: 1    1 Positions of similarity block' start in target and query sequences appropriately. In this case - from the first position in both sequences.
L:    153 Length of this similarity block.
G: 81.13 Homology of this similarity block.
W: 830000 Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:27.5604 Score of this similarity block.

Alignment:


        1 GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEE
          ||||2||44||0||2|||1|552||4||55|||40||||05||0|||1|||05|662||5
        1 GLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPNEMKGSED

1 line - The target sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.
2 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. Figures means the degree of symbols' similarity. Vary from 0 up to 9. 0 - no similarity, 9 - maximal similarity.
3 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.