SeqMatchSW-N description

The program implements Smith-Waterman algorithm for performing local sequence alignment, finding similar regions between two nucleotide sequences. The approach is described in "Identification of Common Molecular Subsequences" , Journal of Molecular Biology, 147:195-197, 1981.The algorithm is a variation of the Needleman-Wunsch dynamic programming algorithm. It is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme).

Example of output:


L:999        Sequence gi|1418273|gb|U60902.1|OCU60902 Otolemur crassicaudatus epsilon-, gamma-, delta-, and beta-globin genes, complete cds, and eta-globin pseudogene
 vs C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\SeqMatchSW-N\1\seq1.fa
Total 1 sequences produce 1 significant alignment(s).

[DD]       1, S:      8.4023, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
****************************************************************************
[DD] Sequence:       1(      1), S:      8.4023, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11
Summ of block lengths: 55, Alignment bounds:
On first  sequence: start       834, end       889, length 56
On second sequence: start       140, end       194, length 55
Block of alignment: 2        
    1 P:       834       140 L:      12, G:  83.33, W:     42, S:4.32049
    2 P:       847       152 L:      43, G:  74.42, W:    116, S:7.31564
        1 attaatagttgacag(..)ttacattttctgagtTATACTTCCAGCtACTCAGGAGGCCG
          ...............(..)...............|0||0|||||||.|||000||||00|
      125 ---------------(..)gtggtggctcatgtcTGTAATTCCAGC-ACTGGAGAGGTAG

      860 AAATGGGAGGATCCCTTGAGCTCAGGAGGTcaaggctgcagtgag(..)caaaaaactgc
          ||0||||||||000||||||||||0|||0|...............(..)...........
      165 AAGTGGGAGGACTGCTTGAGCTCAAGAGTTtgatattatcctgga(..)gca--------

      996 tccg
          ....
      293 ----
....

Where:

1-st line is the header:


[DD] Sequence:       1(      1), S:      8.4023, L:      292 gi|455025|gb|U01317.1|HUMHBB Human beta globin region on chromosome 11

[DD]	Target sequence in direct chain (D), query sequence in direct chain (D). Variants: [DR] - target sequence in direct chain (D), query sequence in reverse chain (R). [RD] - target sequence in reverse chain (R), query sequence in direct chain (D). [RR] - target sequence in reverse chain (R), query sequence in reverse chain (R).
Sequence: 1( 1)	Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4( 5) - the fifth alignment of the fourth sequence from a set
S	Score of this alignment.
L	Length of this query sequence
gi\|455025\|gb\|U01317.1\|HUMHBB Human beta globin region on chromosome 11	Name of this query sequence

Additional information about alignment:


Summ of block lengths: 55, Alignment bounds:
On first  sequence: start       834, end       889, length 56
On second sequence: start       140, end       194, length 55

length

The length covered by alignment, in target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 2        
    1 P:       834       140 L:      12, G:  83.33, W:     42, S:4.32049
    2 P:       847       152 L:      43, G:  74.42, W:    116, S:7.31564

Block of alignment: 2 - amount of blocks. Below each line corresponds to one block:


     1 P:       834       140 L:      12, G:  83.33, W:     42, S:4.32049


1
Block number. 


P:          834   140
Positions of similarity block' start in target and query sequences appropriately.



L:       12
Length  of this similarity block.


G: 83.33
Homology of this similarity block.


W:     42
Weight of this similarity block 
(the arithmetic sum of symbols' similarity  calculated from the given similarity matrix).


S:4.32049
Score of this similarity block.




Alignment:

        1 attaatagttgacag(..)ttacattttctgagtTATACTTCCAGCtACTCAGGAGGCCG
          ...............(..)...............|0||0|||||||.|||000||||00|
      125 ---------------(..)gtggtggctcatgtcTGTAATTCCAGC-ACTGGAGAGGTAG



1 line - Target sequence. Capital letters means blocks of similarity, lower case - not aligned regions. 

2 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. 
Figures means the degree of symbols' similarity. Vary from 0 up to 9.  0 - no similarity, 9 - maximal similarity. 

3 line - Query sequence. Capital letters means blocks of similarity, lower case - not aligned regions.

1	Block number.
P: 834 140	Positions of similarity block' start in target and query sequences appropriately.
L: 12	Length of this similarity block.
G: 83.33	Homology of this similarity block.
W: 42	Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:4.32049	Score of this similarity block.