SeqMatch-N

Program for aligning two multimegabyte-size genome sequences using a sequential search for most significant similarity regions

Program is provided with viewer.

Example of output:


L:426        Sequence  Duck alpha-D globin mRNA, complete cds. vs C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\SeqMatch-N\seq1.fa
Total 1 sequences produce 1 significant alignment(s).

[DD]       1, S:      20.989, L:      429  Equus zebra alpha 1 globin gene, complete cds.
****************************************************************************
[DD] Sequence:       1(      1), S:      20.989, L:      429  Equus zebra alpha 1 globin gene, complete cds.
Summ of block lengths: 356, Alignment bounds:
On target  sequence: start         1, end       408, length 408
On query sequence: start           1, end       411, length 411
Block of alignment: 8        
    1 P:         1         1 L:       1, G: 100.00, W:     10, S:1
    2 P:         2         5 L:      21, G:  80.95, W:    130, S:5.65813
    3 P:        40        43 L:     159, G:  71.07, W:    670, S:13.332
    4 P:       205       208 L:       6, G: 100.00, W:     60, S:3.67423
    5 P:       216       219 L:      12, G:  91.67, W:    100, S:4.93771
    6 P:       235       238 L:      78, G:  80.77, W:    480, S:11.2317
    7 P:       326       329 L:      71, G:  66.20, W:    230, S:7.90613
    8 P:       401       404 L:       8, G: 100.00, W:     80, S:4.38178
          1         8        18        28        38        48
          A---TGCTGACCGCCGAGGACAAGAagctcatcacgcagttgTGGGAGAAGGTGGCTGGC
          |...|||||0|0||||00|||||||.................|||000|||||0|00|||
          AtggTGCTGTCTGCCGCCGACAAGAccaacgtcaaggccgccTGGAGTAAGGTTGGCGGC
          1        11        21        31        41        51

         58        68        78        88        98       108
          CACCAGGAGGAATTCGGAAGTGAAGCTCTGCAGAGGATGTTCCTCGCCTACCCCCAGACC
          0||000|00||0||0||0000||0||0||00|||||||||||||0|0||0||||000|||
          AACGCTGGCGAGTTTGGCGCAGAGGCCCTAGAGAGGATGTTCCTGGGCTTCCCCACCACC
         61        71        81        91       101       111

....

Where:

1-st line is the header:


[DD] Sequence:       1(      1), S:      20.989, L:      429  Equus zebra alpha 1 globin gene, complete cds.
[DD] Target sequence in direct chain (D), query sequence in direct chain (D). Variants:
[DR] - target sequence in direct chain (D), query sequence in reverse chain (R).
[RD] - target sequence in reverse chain (R), query sequence in direct chain (D).
[RR] - target sequence in reverse chain (R), query sequence in reverse chain (R).
Sequence: 1( 1) Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4(      5) - the fifth alignment of the fourth sequence from a set
S Score of this alignment.
L Length of this query sequence
Equus zebra alpha 1 globin gene, complete cds Name of this query sequence

Additional information about alignment:


Summ of block lengths: 356, Alignment bounds:
On target  sequence: start         1, end       408, length 408
On query sequence: start           1, end       411, length 411
length The length covered by alignment, in target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 8    
    1 P:         1         1 L:       1, G: 100.00, W:     10, S:1
    2 P:         2         5 L:      21, G:  80.95, W:    130, S:5.65813

Block of alignment: 8 - Number of blocks in this alignment.
Each line below defines an appropriate block. Detailed description of a line from this list is shown further:


    1 P:         1         1 L:       1, G: 100.00, W:     10, S:1 
1 Block number.
P: 1 1 Positions of similarity block' start in target and query sequences appropriately. In this case - from the first position in both sequences.
L: 1 Length of this similarity block.
G: 100.00 Homology of this similarity block.
W: 10 Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:1 Score of this similarity block.

Alignment:


          1         8        18        28        38        48
          A---TGCTGACCGCCGAGGACAAGAagctcatcacgcagttgTGGGAGAAGGTGGCTGGC
          |...|||||0|0||||00|||||||.................|||000|||||0|00|||
          AtggTGCTGTCTGCCGCCGACAAGAccaacgtcaaggccgccTGGAGTAAGGTTGGCGGC
          1        11        21        31        41        51

1 line - Numbering of the target sequence.
2 line - The target sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.
3 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. Figures means the degree of symbols' similarity. Vary from 0 up to 9. 0 - no similarity, 9 - maximal similarity.
4 line - Numbering of the query sequence.
5 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.