SeqMatchNW-N

The program implements Needleman-Wunsch algorithm to produce a global alignment of two nucleotide sequences. The approach is described in "A general method applicable to the search for similarities in the amino acid sequence of two proteins", J Mol Biol. 48(3):443-53. The Needleman-Wunsch algorithm uses dynamic programming, and is guaranteed to find the alignment with the maximum score with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme.

Program is provided with viewer.

Example of output:


L:999        Sequence gi|1418273|gb|U60902.1|OCU60902 Otolemur crassicaudatus epsilon-, gamma-, delta-, and beta-globin genes, complete cds, and eta-globin pseudogene
 vs C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\SeqMatchNW-N\1\seq1.fa
Total 1 sequences produce 1 significant alignment(s).

[DD]       1, S:      14.962, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
****************************************************************************
[DD] Sequence:       1(      1), S:      14.962, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
Summ of block lengths: 251, Alignment bounds:
On first  sequence: start         1, end       940, length 940
On second sequence: start         2, end       292, length 291
Block of alignment: 37       
    1 P:         1         2 L:       1, G: 100.00, W:      5, S:1
    2 P:        33         3 L:       4, G: 100.00, W:     20, S:2.82843
    3 P:        41         7 L:       4, G: 100.00, W:     20, S:2.82843
    4 P:        58        11 L:       3, G: 100.00, W:     15, S:2.32379
    5 P:       101        14 L:       7, G:  71.43, W:     17, S:2.50185
    6 P:       117        26 L:      13, G:  76.92, W:     38, S:4.02492
    7 P:       141        39 L:       3, G: 100.00, W:     15, S:2.32379
    8 P:       149        42 L:       3, G: 100.00, W:     15, S:2.32379
    9 P:       168        55 L:       9, G:  77.78, W:     27, S:3.30748
   10 P:       201        64 L:      13, G:  61.54, W:     20, S:2.83235
   11 P:       231        77 L:       4, G: 100.00, W:     20, S:2.82843
   12 P:       245        81 L:       3, G: 100.00, W:     15, S:2.32379
   13 P:       255        84 L:       4, G: 100.00, W:     20, S:2.82843
   14 P:       273        88 L:       8, G:  75.00, W:     22, S:2.92119
   15 P:       290        98 L:       8, G:  62.50, W:     13, S:2.19089
   16 P:       304       106 L:      11, G:  90.91, W:     46, S:4.64372
   17 P:       320       121 L:      10, G:  70.00, W:     23, S:3
   18 P:       346       139 L:       9, G:  77.78, W:     27, S:3.30748
   19 P:       368       148 L:       6, G:  83.33, W:     21, S:2.85774
   20 P:       378       154 L:      10, G:  80.00, W:     32, S:3.66667
   21 P:       392       164 L:       4, G: 100.00, W:     20, S:2.82843
   22 P:       411       171 L:       8, G:  75.00, W:     22, S:2.92119
   23 P:       426       179 L:       9, G:  66.67, W:     18, S:2.61116
   24 P:       467       188 L:      10, G:  90.00, W:     41, S:4.33333
   25 P:       482       198 L:       5, G:  80.00, W:     16, S:2.4004
   26 P:       502       203 L:       3, G: 100.00, W:     15, S:2.32379
   27 P:       515       207 L:      12, G:  83.33, W:     42, S:4.32049
   28 P:       547       226 L:      12, G:  75.00, W:     33, S:3.70328
   29 P:       621       238 L:       7, G:  85.71, W:     26, S:3.27165
   30 P:       641       245 L:       7, G:  71.43, W:     17, S:2.50185
   31 P:       653       252 L:       3, G: 100.00, W:     15, S:2.32379
   32 P:       706       255 L:       6, G:  83.33, W:     21, S:2.85774
   33 P:       727       261 L:      17, G:  70.59, W:     40, S:4.10605
   34 P:       888       278 L:       5, G:  80.00, W:     16, S:2.4004
   35 P:       907       283 L:       5, G: 100.00, W:     25, S:3.27327
   36 P:       929       288 L:       2, G: 100.00, W:     10, S:1.73205
   37 P:       938       290 L:       3, G: 100.00, W:     15, S:2.32379
        1 -AttaatagttgacagggatttacactaatgttATTCatcaTAATatgggatgtatcgCT
          .|...............................||||....||||.............||
        1 gA-------------------------------ATTC----TAAT-------------CT

       60 Cattgttgtttatttg(..)gaagaaaagttaaatCATTTCAttctttgtgAAAGACATC
          |...............(..)...............|0|0|||.........|0||0||0|
       13 C---------------(..)---------------CCTCTCAaccct----ACAGTCACC

      126 CATTaacccaccctcTGGatcacTATgctttagcagtttcaaTGTAGGCTAgtaagcctg
          ||||...........|||.....|||................|||0|0|||.........
       35 CATT-----------TGG-----TATattaaagatg------TGTTGTCTA---------

....

Where:

1-st line is the header:


[DD] Sequence:       1(      1), S:      14.962, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
[DD] Target sequence in direct chain (D), query sequence in direct chain (D). Variants:
[DR] - target sequence in direct chain (D), query sequence in reverse chain (R).
[RD] - target sequence in reverse chain (R), query sequence in direct chain (D).
[RR] - target sequence in reverse chain (R), query sequence in reverse chain (R).
Sequence:  1(  1) Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4(      5) - the fifth alignment of the fourth sequence from a set
S Score of this alignment.
L Length of this query sequence
gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11 Name of this query sequence

Additional information about alignment:


Summ of block lengths: 251, Alignment bounds:
On first  sequence: start         1, end       940, length 940
On second sequence: start         2, end       292, length 291
length The length covered by alignment, in target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 37 
    1 P:         1         2 L:       1, G: 100.00, W:      5, S:1
    2 P:        33         3 L:       4, G: 100.00, W:     20, S:2.82843

Block of alignment: 37 - Number of blocks in this alignment.
Each line below defines an appropriate block. Detailed description of a line from this list is shown further:


    2 P:        33         3 L:       4, G: 100.00, W:     20, S:2.82843
2 Block number.
P:    33    3 Positions of similarity block' start in target and query sequences appropriately.
L: 4 Length of this similarity block.
G: 100.00 Homology of this similarity block.
W: 20 Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:2.82843 Score of this similarity block.

Alignment:


       60 Cattgttgtttatttg(..)gaagaaaagttaaatCATTTCAttctttgtgAAAGACATC
          |...............(..)...............|0|0|||.........|0||0||0|
       13 C---------------(..)---------------CCTCTCAaccct----ACAGTCACC

1 line - The target sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.
2 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. Figures means the degree of symbols' similarity. Vary from 0 up to 9. 0 - no similarity, 9 - maximal similarity.
3 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.