GenomeMatch

Alignment of two genomes or chromosomes.
Program for quick aligning of procariotic genomes, chromosomes and chromosomal contigs, genomes of mitochondria, organelles, viruses etc. Program finds relatively long similarity regions, which may contain gaps inside. Such regions may overlap each other, i.e. some nucleotides either in query or in target sequences may belong to different alignments.

Output example:


L:4403836    Sequence gb|AE000516|AE000516 Mycobacterium tuberculosis CDC1551, complete genome vs C:\Program Files\Softberry\MolQuest\example\data\GenomeMatch\seq2.fna
[DD] Sequence:       1(     14), S:       726.8, L:  4411529 emb|AL123456|MTBH37RV Mycobacterium tuberculosis complete genome
Summ of block lengths: 176235, Alignment bounds:
On first  sequence: start   1266719, end   1442971, length 176253
On second sequence: start   1267228, end   1443483, length 176256
Block of alignment: 9        
    1 P:   1266719   1267228 L:   10640, G:  99.98, W: 106350, S:178.608
    2 P:   1277360   1277868 L:    6697, G:  99.90, W:  66760, S:141.524
    3 P:   1284070   1284580 L:   26749, G:  99.98, W: 267317, S:283.187
    4 P:   1310820   1311331 L:    2005, G: 100.00, W:  20050, S:77.5178
    5 P:   1312827   1313337 L:      53, G: 100.00, W:    530, S:12.3781
    6 P:   1312880   1313391 L:   52449, G:  99.96, W: 523830, S:396.44
    7 P:   1365330   1365840 L:   23182, G:  99.99, W: 231720, S:263.654
    8 P:   1388512   1389023 L:   20355, G:  99.99, W: 203470, S:247.058
    9 P:   1408867   1409379 L:   34105, G:  99.98, W: 340857, S:319.777
    1266704   1266704   1266705   1266715   1266725   1266735
          ---------------(..)tgggaccgccattgcCGGGCCGTTCCACGGCCCGTATCGTC
          ...............(..)...............||||||||||||||||||||||||||
          ttgaccgatgacccc(..)tgcgcggcttctcctCGGGCCGTTCCACGGCCCGTATCGTC
          1        11   1267214   1267224   1267234   1267244

    1266745   1266755   1266765   1266775   1266785   1266795
          GCCGCGCTAGGTTGGACGCTGTGCGGATCGTGGTGAGCAGTGCCACCAGAAATGCGGGTT
          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
          GCCGCGCTAGGTTGGACGCTGTGCGGATCGTGGTGAGCAGTGCCACCAGAAATGCGGGTT
    1267254   1267264   1267274   1267284   1267294   1267304

    1266805   1266815   1266825   1266835   1266845   1266855
          CGTACACCTGTGTCAGCACCGGCAGCGCTGGATGCCGCGAGATTACACCGCCCCTCGCTG
          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
          CGTACACCTGTGTCAGCACCGGCAGCGCTGGATGCCGCGAGATTACACCGCCCCTCGCTG
    1267314   1267324   1267334   1267344   1267354   1267364

    1266865   1266875   1266885   1266895   1266905   1266915
          GGCCCACGCCTGGGCCGGTGAACCCCGGCCCGCCCGCTGGCACCCTGCGAACCAGCCTGC
          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
          GGCCCACGCCTGGGCCGGTGAACCCCGGCCCGCCCGCTGGCACCCTGCGAACCAGCCTGC
    1267374   1267384   1267394   1267404   1267414   1267424

Where:

1-st line is the header:


[DD] Sequence:       1(     14), S:       726.8, L:  4411529 emb|AL123456|MTBH37RV Mycobacterium tuberculosis complete genome
[DD] Target sequence in direct chain (D), query sequence in direct chain (D). Variants:
[DR] - target sequence in direct chain (D), query sequence in reverse chain (R).
[RD] - target sequence in reverse chain (R), query sequence in direct chain (D).
[RR] - target sequence in reverse chain (R), query sequence in reverse chain (R).
Sequence: 1(    14) Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4 - the fifth alignment of the fourth sequence from a set
S Score of this alignment.
L Length of this query sequence
emb|AL123456|MTBH37RV Mycobacterium tuberculosis complete genome Name of this query sequence

Additional information about alignment:


Summ of block lengths: 176235, Alignment bounds:
On first  sequence: start   1266719, end   1442971, length 176253
On second sequence: start   1267228, end   1443483, length 176256
length The length covered by alignment, on target and query sequences appropriately.

List of alignment blocks:


Block of alignment: 9        
    1 P:   1266719   1267228 L:   10640, G:  99.98, W: 106350, S:178.608
    2 P:   1277360   1277868 L:    6697, G:  99.90, W:  66760, S:141.524

Block of alignment: 8 - Number of blocks in this alignment.
Each line below defines an appropriate block. Detailed description of a line from this list is shown further:


    1 P:   1266719   1267228 L:   10640, G:  99.98, W: 106350, S:178.608
1 Block number.
P: 1266719     1267228 Positions of similarity block' start on target and query sequences accordingly.
L: 10640 Length of this similarity block.
G: 99.98 Homology of this similarity block.
W: 106350 Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix).
S:178.608 Score of this similarity block.

Alignment:


    1266704   1266704   1266705   1266715   1266725   1266735
          ---------------(..)tgggaccgccattgcCGGGCCGTTCCACGGCCCGTATCGTC
          ...............(..)...............||||||||||||||||||||||||||
          ttgaccgatgacccc(..)tgcgcggcttctcctCGGGCCGTTCCACGGCCCGTATCGTC
          1        11   1267214   1267224   1267234   1267244

1 line - Numbering of the target sequence.
2 line - The target sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.
3 line - Separator line. Separator line symbols: "|" - perfect coincidence between symbols. Figures means the degree of symbols' similarity. Vary from 0 up to 9. 0 - no similarity, 9 - maximal similarity.
4 line - Numbering of the query sequence.
5 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.