SeqMatchSW-N

The program implements Smith-Waterman algorithm for performing local sequence alignment, finding similar regions between two nucleotide sequences. The approach is described in "Identification of Common Molecular Subsequences" , Journal of Molecular Biology, 147:195-197, 1981.The algorithm is a variation of the Needleman-Wunsch dynamic programming algorithm. It is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme).

Program is provided with viewer.

Example of output:


L:999        Sequence gi|1418273|gb|U60902.1|OCU60902 Otolemur crassicaudatus epsilon-, gamma-, delta-, and beta-globin genes, complete cds, and eta-globin pseudogene
 vs C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\SeqMatchSW-N\1\seq1.fa
Total 1 sequences produce 1 significant alignment(s).

[DD]       1, S:      8.4023, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
****************************************************************************
[DD] Sequence:       1(      1), S:      8.4023, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
Summ of block lengths: 55, Alignment bounds:
On first  sequence: start       834, end       889, length 56
On second sequence: start       140, end       194, length 55
Block of alignment: 2        
    1 P:       834       140 L:      12, G:  83.33, W:     42, S:4.32049
    2 P:       847       152 L:      43, G:  74.42, W:    116, S:7.31564
        1 attaatagttgacag(..)ttacattttctgagtTATACTTCCAGCtACTCAGGAGGCCG
          ...............(..)...............|0||0|||||||.|||000||||00|
      125 ---------------(..)gtggtggctcatgtcTGTAATTCCAGC-ACTGGAGAGGTAG

      860 AAATGGGAGGATCCCTTGAGCTCAGGAGGTcaaggctgcagtgag(..)caaaaaactgc
          ||0||||||||000||||||||||0|||0|...............(..)...........
      165 AAGTGGGAGGACTGCTTGAGCTCAAGAGTTtgatattatcctgga(..)gca--------

      996 tccg
          ....
      293 ----
....

Where:

1-st line is the header:


[DD] Sequence:       1(      1), S:      8.4023, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
[DD] Target sequence in direct chain (D), query sequence in direct chain (D). Variants:
[DR] - target sequence in direct chain (D), query sequence in reverse chain (R).
[RD] - target sequence in reverse chain (R), query sequence in direct chain (D).
[RR] - target sequence in reverse chain (R), query sequence in reverse chain (R).
Sequence:  1(  1) Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4(      5) - the fifth alignment of the fourth sequence from a set
S Score of this alignment.
L Length of this query sequence
gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11 Name of this query sequence

Additional information about alignment:


Summ of block lengths: 55, Alignment bounds:
On first  sequence: start       834, end       889, length 56
On second sequence: start       140, end       194, length 55
length The length covered by alignment, in target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 2        
    1 P:       834       140 L:      12, G:  83.33, W:     42, S:4.32049
    2 P:       847       152 L:      43, G:  74.42, W:    116, S:7.31564

Block of alignment: 2 - amount of blocks. Below each line corresponds to one block:


     1 P:       834       140 L:      12, G:  83.33, W:     42, S:4.32049
1 Block number.
P:    834   140 Positions of similarity block' start in target and query sequences appropriately.
L: 12 Length of this similarity block.
G: 83.33 Homology of this similarity block.
W: 42 Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:4.32049 Score of this similarity block.

Alignment:


        1 attaatagttgacag(..)ttacattttctgagtTATACTTCCAGCtACTCAGGAGGCCG
          ...............(..)...............|0||0|||||||.|||000||||00|
      125 ---------------(..)gtggtggctcatgtcTGTAATTCCAGC-ACTGGAGAGGTAG


1 line - Target sequence. Capital letters means blocks of similarity, lower case - not aligned regions.
2 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. Figures means the degree of symbols' similarity. Vary from 0 up to 9. 0 - no similarity, 9 - maximal similarity.
3 line - Query sequence. Capital letters means blocks of similarity, lower case - not aligned regions.