|
|
|
EstMap |
Program for mapping a whole set of mRNAs/ESTs to a chromosome sequence. For example, 11,000 sequences of full mRNAs from NCBI reference set were mapped to 52-MB unmasked Y chromosome fragment in about 18-25 min, depending on computer memory size. EstMap takes into account statistical features of splice sites for more accurate mapping.
EstMap is part of FGENESH++C genome annotation pipeline, where it maps RefSeq sequences to a query genome at very early stages of annotation.
Example of an output of the EstMap program:
L:4000001 Sequence chr7 [cut:73000000 77000000] vs C:\Documents and Settings\My Documents\MolQuestWorkSpace\example_data\EstMap\seq.fa
[DD] Sequence: 1( 1), S: 36.26, L: 457 AA628013 nq61d05.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1148361 3', mRN
Summ of block lengths: 457, Alignment bounds:
On first sequence: start 2214596, end 2215412, length 817
On second sequence: start 1, end 457, length 457
Block of alignment: 4
1 E: 2214596 234 [ct CT] P: 2214596 1 L: 234, G: 99.57, W: 2305, S:26.2324
2 E: 2214966 69 [AC CT] P: 2214966 235 L: 69, G: 100.00, W: 690, S:14.1834
3 E: 2215144 65 [AC CT] P: 2215144 304 L: 65, G: 100.00, W: 650, S:13.7542
4 E: 2215324 89 [AC aa] P: 2215324 369 L: 89, G: 97.75, W: 820, S:15.6754
1 gagccaagattgtgc(..)acgctcaggccacct?[CTGGGCCTCTCTTTATTGAGGGCA
...............(..)............... ||||||||||||||||||||||||
1 ---------------(..)--------------- CTGGGCCTCTCTTTATTGAGGGCA
2214620 CTGGGCCCAGGTCTTCCTTCAGGGCCCACAGCGCCCATAAAACCCAAGGGAGAATAGAAG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
25 CTGGGCCCAGGTCTTCCTTCAGGGCCCACAGCGCCCATAAAACCCAAGGGAGAATAGAAG
2214680 AGACCCCCTGATACACGCACACTCGAGGGGCGCCTCCCATCCCCTCCCACAACACACAGG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
85 AGACCCCCTGATACACGCACACTCGAGGGGCGCCTCCCATCCCCTCCCACAACACACAGG
2214740 ACAGAAGCCCCTCTGGGCCGGCAGGGGAAGGCCCAGCCTCAATCCTTCTTGCTCCCGTGC
|||||||||||||||||||||||0||||||||||||||||||||||||||||||||||||
145 ACAGAAGCCCCTCTGGGCCGGCAAGGGAAGGCCCAGCCTCAATCCTTCTTGCTCCCGTGC
2214800 CGCTGACTGTGAAACTTGTGGTGCACAACC]ctcagggtggtgaag(..)gggaccccgg
|||||||||||||||||||||||||||||| ...............(..)..........
205 CGCTGACTGTGAAACTTGTGGTGCACAACC ---------------(..)----------
2214961 ctcac[CTGCCACTCCTTGCACTGAGGGTCCTGGGCCAGGTTGAACAACGTCAGCGCGTT
..... ||||||||||||||||||||||||||||||||||||||||||||||||||||||
235 ----- CTGCCACTCCTTGCACTGAGGGTCCTGGGCCAGGTTGAACAACGTCAGCGCGTT
2215020 AAAAAGCTGCCAGAA]ctaagcagggaggag(..)agaggcacgacttac[GTGTCCAAA
||||||||||||||| ...............(..)............... |||||||||
289 AAAAAGCTGCCAGAA ---------------(..)--------------- GTGTCCAAA
2215153 GAAAAGAAAAGGCAGCAGGAAGGTGAGGCCCCGCCACATCCAGGACTGGAAGCCCT]ctg
|||||||||||||||||||||||||||||||||||||||||||||||||||||||| ...
313 GAAAAGAAAAGGCAGCAGGAAGGTGAGGCCCCGCCACATCCAGGACTGGAAGCCCT ---
2215212 cggggaggaagg(..)ccactcccgactcac[CCACAGTGAGGTCCATGGTGTGCCGCTC
............(..)............... ||||||||||||||||||||||||||||
369 ------------(..)--------------- CCACAGTGAGGTCCATGGTGTGCCGCTC
2215352 GCCCAGCGCCCGCAGGCGGTAGAGGCAGCCGCTCTGGTAGTAGTACTGGAGAAACTGCAC
||||||||||||||||0|0|||||||||||||||||||||||||||||||||||||||||
397 GCCCAGCGCCCGCAGGGGATAGAGGCAGCCGCTCTGGTAGTAGTACTGGAGAAACTGCAC
2215412 G]?aagcctgggccgggc(..)tacagcaaaactgga
| ...............(..)...............
457 G ---------------(..)---------------
[DD] Sequence: 1( 1), S: 36.26, L: 457 AA628013 nq61d05.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1148361 3', mRNA sequence.
| [DD] | Target sequence in direct chain (D), query sequence in direct chain (D). Variants:
[DR] - target sequence in direct chain (D), query sequence in reverse chain (R). [RD] - target sequence in reverse chain (R), query sequence in direct chain (D). [RR] - target sequence in reverse chain (R), query sequence in reverse chain (R). |
| Sequence: 1( 1) | Order number of sequence from a query set which is submitted to alignment. In brackets is an order number for alignment of this sequence (if it resulted in more than one alignment). Variants: 4( 5) - the fifth alignment of the fourth sequence from a set |
| S | Score of this alignment. |
| L | Length of this query sequence |
| AA628013 nq61d05.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1148361 3', mRNA sequence. | Name of this query sequence |
Summ of block lengths: 457, Alignment bounds: On first sequence: start 2214596, end 2215412, length 817 On second sequence: start 1, end 457, length 457
| length | The length covered by alignment, in target and query sequences appropriately. |
Block of alignment: 4
1 E: 2214596 234 [ct CT] P: 2214596 1 L: 234, G: 99.57, W: 2305, S:26.2324
2 E: 2214966 69 [AC CT] P: 2214966 235 L: 69, G: 100.00, W: 690, S:14.1834
Block of alignment: 4 - Number of blocks in this alignment.
Each line below defines an appropriate block. Detailed description of a line from this list is shown further:
1 E: 2214596 234 [ct CT] P: 2214596 1 L: 234, G: 99.57, W: 2305, S:26.2324
| 1 | Block number. |
| E: 2214596 234 [ct CT] | Starting point and length of exon in the first sequence.
[ct CT] - edging nucleotides of exon. Small letters - the edge is defined imprecisely. Capital letters - the edge is defined precisely. |
| P: 2214596 1 | Positions of similarity block' start in target and query sequences appropriately. |
| L: 234 | Length of this similarity block. |
| G: 99.57 | Homology of this similarity block. |
| W: 2305 | Weight of this similarity block (the arithmetic sum of symbols' similarity calculated from the given similarity matrix). |
| S:26.2324 | Score of this similarity block. |
1 gagccaagattgtgc(..)acgctcaggccacct?[CTGGGCCTCTCTTTATTGAGGGCA
...............(..)............... ||||||||||||||||||||||||
1 ---------------(..)--------------- CTGGGCCTCTCTTTATTGAGGGCA
1 line - The target sequence itself. Capital letters correspond to blocks of similarity,
lower case - not aligned regions. [] - edges of exon. ?[ - unsure edge of exon.
2 line - Separator line.
3 line - The query sequence itself. Capital letters correspond to blocks of similarity, lower case - not aligned regions.