|
|
|
TSSP |
Recognition of human Pol II promoter region and start of transcription
Algorithm predicts potential transcription start positions by linear discriminant function combining characteristics describing functional motifs and oligonucleotide composition of these sites. TSSP uses file with selected factor binding sites from RegSite DB (Plants) developed by Softberry Inc.
References:
1. Solovyev V.V., Salamov A.A. (1997)
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.
In Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology (eds.Rawling C.,Clark D.,
Altman R.,Hunter L.,Lengauer T.,Wodak S.), Halkidiki, Greece, AAAI Press,294-302.
2. Solovyev V.V. (2001)
Statistical approaches in Eukaryotic gene prediction.
In Handbook of Statistical genetics (eds. Balding D. et al.), John Wiley & Sons, Ltd., p. 83-127.
3. Solovyev VV, Shahmuradov IA. (2003)
PromH: Promoters identification using orthologous genomic sequences.
Nucleic Acids Res. 31(13):3540-3545.
First line - name of your sequence;
Second and Third lines - LDF threshold and the length of presented sequence
4th line - The number of predicted promoter regions
Next lines - positions of predicted sites, their 'weights' and TATA box position (if found)
Position shows the first nucleotide of the transcript (TSS position)
After that functional motifs are given for each predicted region; (+) or (-) reflects the direct or
complementary chain; Fields like "RSP00004 tagaCACGTaga" mean a particular motif identificator with found
similar sequence from the Softberry Regsite-Plant data base.
tssp Wed Jul 10 02:52:32 EDT 2002
>gi|1902902|dbj|AB001920.1| Oryza sativa (japonica cultivar-group) gene for phos
Length of sequence- 5871
Thresholds for TATA+ promoters - 0.02, for TATA-/enhancers - 0.04
2 promoter/enhancer(s) are predicted
Promoter Pos: 1522 LDF- 0.13 TATA box at 1488 18.93
Enhancer Pos: 1597 LDF- 0.12
Transcription factor binding sites/RegSite DB:
for promoter at position - 1522
1468 (-) RSP00004 tagaCACGTaga
1459 (+) RSP00010 cACGTG
1456 (+) RSP00011 ctccACGTGgt
1461 (+) RSP00016 caTGCAC
1468 (-) RSP00016 caTGCAC
1256 (-) RSP00026 gcttttgaTGACtTcaaacac
1460 (+) RSP00065 ACGTGgcgc
1460 (+) RSP00066 ACGTGccgc
1459 (+) RSP00069 tACGTG
1341 (+) RSP00071 GACGTC
1346 (-) RSP00071 GACGTC
1452 (-) RSP00096 GGTTT
1432 (+) RSP00129 CACGAC
1281 (+) RSP00148 CGACG
1284 (+) RSP00148 CGACG
1315 (+) RSP00148 CGACG
1335 (+) RSP00148 CGACG
1340 (+) RSP00148 CGACG
1365 (+) RSP00148 CGACG
1434 (+) RSP00148 CGACG
1458 (+) RSP00148 CGACG
1347 (-) RSP00148 CGACG
1474 (+) RSP00162 ACACccGagctaaccacaac
1348 (+) RSP00241 CGGTCA
1387 (+) RSP00339 RTTTTTR
1264 (-) RSP00397 AGTGGCGG
1268 (+) RSP00422 ACCGAC
1459 (+) RSP00423 GACGTG
1464 (-) RSP00424 CACGTC
1369 (-) RSP00431 rdygRCRGTTRs
1278 (-) RSP00432 cVacGGTaGGTgg
1249 (-) RSP00436 TTGACT
1260 (+) RSP00463 atttcatggCCGACctgcttttt
1260 (+) RSP00464 acttgatggCCGACctctttttt
1260 (+) RSP00465 aatatactaCCGACcatgagttct
1265 (+) RSP00466 actaCCGACatgagttccaaaaagc
1440 (+) RSP00469 GNGGTG
1260 (-) RSP00469 GNGGTG
1440 (+) RSP00470 GTGGNG
1263 (-) RSP00470 GTGGNG
1257 (-) RSP00470 GTGGNG
1390 (+) RSP00477 TTTAA
1385 (+) RSP00508 gcaTTTTTatca
1502 (-) RSP00508 gcaTTTTTatca
1469 (+) RSP00518 tccctACACgcGtcacaattc
1465 (+) RSP00519 caattcaggACACgtGccctcttca
1474 (+) RSP00521 ACACccG
1474 (+) RSP00523 ACACgcG
1474 (+) RSP00524 ACACgtG
for promoter at position - 1597
1468 (-) RSP00004 tagaCACGTaga
1459 (+) RSP00010 cACGTG
1456 (+) RSP00011 ctccACGTGgt
1461 (+) RSP00016 caTGCAC
1468 (-) RSP00016 caTGCAC
1460 (+) RSP00065 ACGTGgcgc
1460 (+) RSP00066 ACGTGccgc
1459 (+) RSP00069 tACGTG
1341 (+) RSP00071 GACGTC
1346 (-) RSP00071 GACGTC
1452 (-) RSP00096 GGTTT
1432 (+) RSP00129 CACGAC
1315 (+) RSP00148 CGACG
1335 (+) RSP00148 CGACG
1340 (+) RSP00148 CGACG
1365 (+) RSP00148 CGACG
1434 (+) RSP00148 CGACG
1458 (+) RSP00148 CGACG
1347 (-) RSP00148 CGACG
1474 (+) RSP00162 ACACccGagctaaccacaac
..............................
Lower cased letters mean non-conserved nucleotides in the site consensus
The letters except (A,T,G,C) describe ambiguous sites in a given DNA sequence motif, where a single character may represent more than one nucleotide using Standard IUPAC Nucleotide code.
See TABLE at http://www.yeastract.com/help/help_searchbydnamotif.php#Ref1
| IUPAC Code | Meaning | Origin of Description |
| G | G | Guanine |
| A | A | Adenine |
| T | T | Thymine |
| C | C | Cytosine |
| R | G or A | puRine |
| Y | T or C | pYrimidine |
| M | A or C | aMino |
| K | G or T | Ketone |
| S | G or C | Strong interaction |
| W | A or T | Weak interaction |
| H | A or C or T | not-G, H follows G in the alphabet |
| B | G or T or C | not-A, B follows A in the alphabet |
| V | G or C or A | not-T (not-U), V follows U in the alphabet |
| D | G or A or T | not-C, D follows C in the alphabet |
| N | G or A or T or C | aNy |