SeqMatchSW-P description

The program implements Smith-Waterman algorithm for performing local sequence alignment, finding similar regions between two protein sequences. The approach is described in "Identification of Common Molecular Subsequences" , Journal of Molecular Biology, 147:195-197, 1981.The algorithm is a variation of the Needleman-Wunsch dynamic programming algorithm. It is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme).

Example of output:


 L:153        Sequence MYOGLOBIN MAP TURTLE vs. 19 Base sequences [C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\SeqMatchSW-P\seq1.set.fa].
Total 19 sequences produce 19 significant alignment(s).

[DD]       7, S:      28.714, L:      153 MYOGLOBIN CHICKEN
[DD]      17, S:       27.56, L:      153  MYOGLOBIN HUMAN
[DD]       9, S:      27.482, L:      153 MYOGLOBIN N.AMERICAN OPOSSUM
[DD]       5, S:      26.354, L:      153 MYOGLOBIN SADDLEBACK DOLPHIN
[DD]       8, S:      12.825, L:      146 HEMOGLOBIN BETA CHICKEN
[DD]      13, S:      12.564, L:      141 HEMOGLOBIN ALPHA NILE CROCODILE
[DD]       6, S:      12.323, L:      140 HEMOGLOBIN BETA EDIBLE FROG
[DD]      10, S:      12.259, L:      146 HEMOGLOBIN BETA N.AMERICAN OPOSSUM
[DD]      19, S:      12.226, L:      146  HEMOGLOBIN BETA HUMAN
[DD]      11, S:      11.865, L:      141 HEMOGLOBIN ALPHA BULLFROG
[DD]      14, S:      11.713, L:      141 HEMOGLOBIN ALPHA OSTRICH
[DD]      15, S:      11.353, L:      141   HEMOGLOBIN ALPHA EASTERN GRAY KANGAROO
[DD]      18, S:      11.235, L:      141  HEMOGLOBIN ALPHA HUMAN
[DD]      16, S:       10.87, L:      142 HEMOGLOBIN ALPHA ABYSSINIAN HYRAX
[DD]      12, S:      10.849, L:      146 HEMOGLOBIN BETA NILE CROCODILE
[DD]       2, S:      8.2676, L:      161 HEMOGLOBIN I.PARASPONIA ANDERSONII
[DD]       1, S:      7.6599, L:      146 HEMOGLOBIN VITREOSCILLA SP.
[DD]       3, S:      6.1534, L:      153 LEGHEMOGLOBIN I. YELLOW LUPIN
[DD]       4, S:      5.4138, L:      143 LEGHEMOGLOBIN I.BROAD BEAN .
****************************************************************************
[DD] Sequence:       7(      1), S:      28.714, L:      153 MYOGLOBIN CHICKEN
Summ of block lengths: 153, Alignment bounds:
On first  sequence: start         1, end       153, length 153
On second sequence: start         1, end       153, length 153
Block of alignment: 1        
    1 P:         1         1 L:     153, G:  84.27, W: 874000, S:28.7142
        1 GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEE
          ||||2||44||0||2|||1|552||4||55|||40||||05||0|||1|||05|662||5
        1 GLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPNEMKGSED

       61 VKKHGTTVLTALGRILKLKNNHEPELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHP
          4||||2||||1||6|||0|12||15|||||65|||||||||||||||1|7|7|||||||1
       61 LKKHGATVLTQLGKILKQKGQHESDLKPLAQTHATKHKIPVKYLEFISEVIIKVIAEKHA

      121 SDFGADSQAAMRKALELFRNDMASKYKEFGFQG
          5||||||||||6|||||||||||||||||||||
      121 ADFGADSQAAMKKALELFRNDMASKYKEFGFQG
[DD] Sequence:      17(      1), S:       27.56, L:      153  MYOGLOBIN HUMAN
Summ of block lengths: 153, Alignment bounds:
On first  sequence: start         1, end       153, length 153
On second sequence: start         1, end       153, length 153
Block of alignment: 1        
    1 P:         1         1 L:     153, G:  81.13, W: 830000, S:27.5604
        1 GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEE
          ||||0||40||17|2|||1|512|||||5||||50||||0|6|0|||4||50||665||5
        1 GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASED

       61 VKKHGTTVLTALGRILKLKNNHEPELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHP
          4||||2|||||||0|||0|14||1|5||||6||||||||||||||||1|0|75|512|||
       61 LKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHP

      121 SDFGADSQAAMRKALELFRNDMASKYKEFGFQG
          2|||||5|2||1|||||||2||||2|||4||||
      121 GDFGADAQGAMNKALELFRKDMASNYKELGFQG
  ....

Where:

1-st line is the header:


[DD] Sequence:       7(      1), S:      28.714, L:      153 MYOGLOBIN CHICKEN

[DD]	No sence, used for output compatibility on nucleotide sequence alignment.
Sequence: 7( 7)	Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4( 5) - the fifth alignment of the fourth sequence from a set.
S	Score of this alignment.
L	Length of this query sequence
MYOGLOBIN CHICKEN	Name of this query sequence

Additional information about alignment:


Summ of block lengths: 153, Alignment bounds:
On first  sequence: start         1, end       153, length 153
On second sequence: start         1, end       153, length 153

length

The length covered by alignment, in target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 1        
    1 P:         1         1 L:     153, G:  84.27, W: 874000, S:28.7142

Block of alignment: 1 - amount of blocks. Below each line corresponds to one block:


    1 P:         1         1 L:     153, G:  84.27, W: 874000, S:28.7142

1	Block number.
P: 1 1	Positions of similarity block' start in target and query sequences appropriately. In this case - from the first position in both sequences.
L: 153	Length of this similarity block.
G: 84.27	Homology of this similarity block.
W: 874000	Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:28.7142	Score of this similarity block.

Alignment:


        1 GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEE
          ||||2||44||0||2|||1|552||4||55|||40||||05||0|||1|||05|662||5
        1 GLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPNEMKGSED

1 line - The target sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.
2 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. Figures means the degree of symbols' similarity. Vary from 0 up to 9. 0 - no similarity, 9 - maximal similarity.
3 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.