|
EstMap |
Program for mapping a whole set of mRNAs/ESTs to a chromosome sequence. For example, 11,000 sequences of full mRNAs from NCBI reference set were mapped to 52-MB unmasked Y chromosome fragment in about 18-25 min, depending on computer memory size. EstMap takes into account statistical features of splice sites for more accurate mapping.
EstMap is part of FGENESH++C genome annotation pipeline, where it maps RefSeq sequences to a query genome at very early stages of annotation.
Example of an output of the EstMap program:
L:4000001 Sequence chr7 [cut:73000000 77000000] vs C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\EstMap\seq.fa [DD] Sequence: 1( 1), S: 36.26, L: 457 AA628013 nq61d05.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1148361 3', mRN Summ of block lengths: 457, Alignment bounds: On first sequence: start 2214596, end 2215412, length 817 On second sequence: start 1, end 457, length 457 Block of alignment: 4 1 E: 2214596 234 [ct CT] P: 2214596 1 L: 234, G: 99.57, W: 2305, S:26.2324 2 E: 2214966 69 [AC CT] P: 2214966 235 L: 69, G: 100.00, W: 690, S:14.1834 3 E: 2215144 65 [AC CT] P: 2215144 304 L: 65, G: 100.00, W: 650, S:13.7542 4 E: 2215324 89 [AC aa] P: 2215324 369 L: 89, G: 97.75, W: 820, S:15.6754 1 gagccaagattgtgc(..)acgctcaggccacct?[CTGGGCCTCTCTTTATTGAGGGCA ...............(..)............... |||||||||||||||||||||||| 1 ---------------(..)--------------- CTGGGCCTCTCTTTATTGAGGGCA 2214620 CTGGGCCCAGGTCTTCCTTCAGGGCCCACAGCGCCCATAAAACCCAAGGGAGAATAGAAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 25 CTGGGCCCAGGTCTTCCTTCAGGGCCCACAGCGCCCATAAAACCCAAGGGAGAATAGAAG 2214680 AGACCCCCTGATACACGCACACTCGAGGGGCGCCTCCCATCCCCTCCCACAACACACAGG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 85 AGACCCCCTGATACACGCACACTCGAGGGGCGCCTCCCATCCCCTCCCACAACACACAGG 2214740 ACAGAAGCCCCTCTGGGCCGGCAGGGGAAGGCCCAGCCTCAATCCTTCTTGCTCCCGTGC |||||||||||||||||||||||0|||||||||||||||||||||||||||||||||||| 145 ACAGAAGCCCCTCTGGGCCGGCAAGGGAAGGCCCAGCCTCAATCCTTCTTGCTCCCGTGC 2214800 CGCTGACTGTGAAACTTGTGGTGCACAACC]ctcagggtggtgaag(..)gggaccccgg |||||||||||||||||||||||||||||| ...............(..).......... 205 CGCTGACTGTGAAACTTGTGGTGCACAACC ---------------(..)---------- 2214961 ctcac[CTGCCACTCCTTGCACTGAGGGTCCTGGGCCAGGTTGAACAACGTCAGCGCGTT ..... |||||||||||||||||||||||||||||||||||||||||||||||||||||| 235 ----- CTGCCACTCCTTGCACTGAGGGTCCTGGGCCAGGTTGAACAACGTCAGCGCGTT 2215020 AAAAAGCTGCCAGAA]ctaagcagggaggag(..)agaggcacgacttac[GTGTCCAAA ||||||||||||||| ...............(..)............... ||||||||| 289 AAAAAGCTGCCAGAA ---------------(..)--------------- GTGTCCAAA 2215153 GAAAAGAAAAGGCAGCAGGAAGGTGAGGCCCCGCCACATCCAGGACTGGAAGCCCT]ctg |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ... 313 GAAAAGAAAAGGCAGCAGGAAGGTGAGGCCCCGCCACATCCAGGACTGGAAGCCCT --- 2215212 cggggaggaagg(..)ccactcccgactcac[CCACAGTGAGGTCCATGGTGTGCCGCTC ............(..)............... |||||||||||||||||||||||||||| 369 ------------(..)--------------- CCACAGTGAGGTCCATGGTGTGCCGCTC 2215352 GCCCAGCGCCCGCAGGCGGTAGAGGCAGCCGCTCTGGTAGTAGTACTGGAGAAACTGCAC ||||||||||||||||0|0||||||||||||||||||||||||||||||||||||||||| 397 GCCCAGCGCCCGCAGGGGATAGAGGCAGCCGCTCTGGTAGTAGTACTGGAGAAACTGCAC 2215412 G]?aagcctgggccgggc(..)tacagcaaaactgga | ...............(..)............... 457 G ---------------(..)---------------
[DD] Sequence: 1( 1), S: 36.26, L: 457 AA628013 nq61d05.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1148361 3', mRNA sequence.
[DD] | Target sequence in direct chain (D), query sequence in direct chain (D). Variants:
[DR] - target sequence in direct chain (D), query sequence in reverse chain (R). [RD] - target sequence in reverse chain (R), query sequence in direct chain (D). [RR] - target sequence in reverse chain (R), query sequence in reverse chain (R). |
Sequence: 1( 1) | Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4( 5) - the fifth alignment of the fourth sequence from a set |
S | Score of this alignment. |
L | Length of this query sequence |
AA628013 nq61d05.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1148361 3', mRNA sequence. | Name of this query sequence |
Summ of block lengths: 457, Alignment bounds: On first sequence: start 2214596, end 2215412, length 817 On second sequence: start 1, end 457, length 457
length | The length covered by alignment, in target and query sequences appropriately. |
Block of alignment: 4 1 E: 2214596 234 [ct CT] P: 2214596 1 L: 234, G: 99.57, W: 2305, S:26.2324 2 E: 2214966 69 [AC CT] P: 2214966 235 L: 69, G: 100.00, W: 690, S:14.1834
Block of alignment: 4 - Number of blocks in this alignment.
Each line below defines an appropriate block. Detailed description of a line from this list is shown further:
1 E: 2214596 234 [ct CT] P: 2214596 1 L: 234, G: 99.57, W: 2305, S:26.2324
1 | Block number. |
E: 2214596 234 [ct CT] | Starting point and length of exon in the first sequence.
[ct CT] - edging nucleotides of exon. Small letters - the edge is defined imprecisely. Capital letters - the edge is defined precisely. |
P: 2214596 1 | Positions of similarity block' start in target and query sequences appropriately. |
L: 234 | Length of this similarity block. |
G: 99.57 | Homology of this similarity block. |
W: 2305 | Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix). |
S:26.2324 | Score of this similarity block. |
1 gagccaagattgtgc(..)acgctcaggccacct?[CTGGGCCTCTCTTTATTGAGGGCA ...............(..)............... |||||||||||||||||||||||| 1 ---------------(..)--------------- CTGGGCCTCTCTTTATTGAGGGCA
1 line - The target sequence itself. Capital letters correspond to blocks of similarity,
lower case - not aligned regions. [] - edges of exon. ?[ - unsure edge of exon.
2 line - Separator line.
3 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.