|
Nsite |
Search for of consensus patterns with statistical estimation.
Nsite can be used for analysis of regulatory regions and composition of their functional motifs.
The method is based on statistical estimation of expected number of a nucleotide consensus pattern in a given sequence [1-2,4]. It uses the Nsite formatted datafile, which can include any set of consensus sequences of functional motifs. In current version this file consists of the release of Transfac sequences (3.4, 1998, academic release), composite elements [3] and a set additional functional motifs.
If we find a pattern which has expected number significantly less than 1, it can be supposed that the analyzed sequence possesses the pattern's function.
In the output of Nsite we can see a pattern, its position in the sequence, accession number, ID, Description of motif and binding factor name from the original database if exist.
Table 1. Summary of single-letter code recommendations
Symbol | Meaning | Origin of designation |
G | G | Guanine |
A | A | Adenine |
T | T | Thymine |
C | C | Cytosine |
R | G or A | puRine |
Y | T or C | pYrimidine |
M | A or C | aMino |
K | G or T | Keto |
S | G or C | Strong interaction (3 H bonds) |
W | A or T | Weak interaction (2 H bonds) |
H | A or C or T | not-G, H follows G in the alphabet |
B | G or T or C | not-A, B follows A |
V | G or C or A | not-T (not-U), V follows U |
D | G or A or T | not-C, D follows C |
N | G or A or T or C | aNy |
Program NSITE (Softberry Inc.) | Version 2.2004 Search for motifs of 1500 Regulatory Elements (REs) | SET of REs: REGSITE DB (Transcription Regulatory Sites from human and animals) [ Last Update: March 10, 2006] ____________________________________________________________ Search PARAMETRS: Expected Mean Number : 0.0000000 Statistical Siginicance Level : 0.0000000 Level of homology between known RE and motif: 80% Variation of Distance between RE Blocks : 20% NOTE: RE - Regulatory Element/Consensus | AC - Accession No of RE in a given DB OS - Organism/Species | BF - Binding Factor or One of them Mism. - Mismatches | Mean. Exp. Number - Mean Expected Number | Up.Conf.Int. - Upper Confidence Interval ============================================================ QUERY: >test_nsite.seq Length of Query Sequence: 2319 bp | Nucleotide Frequencies: A - 0.33 G - 0.19 T - 0.30 C - 0.18 ............................................................ RE: 620. AC: RSA00620//OS: chicken /GENE: BGP/RE: G-string /BF: erythrocyte-specific protein Motifs on "-" Strand: Mean Exp. Number 0.00000 Up.Conf.Int. 1 Found 5 2216 cGGGGGGGGGGGGGGG 2201 (Mism.= 1) 2215 GGGGGGGGGGGGGGGG 2200 (Mism.= 0) 2214 GGGGGGGGGGGGGGGG 2199 (Mism.= 0) 2213 GGGGGGGGGGGGGGGG 2198 (Mism.= 0) 2212 GGGGGGGGGGGGGGGt 2197 (Mism.= 1) ............................................................ Totally 5 motifs of 1 different REs have been found ------------------------------------------------------------
Reference:
[1] Shahmuradov K.A. Kolchanov N.A.Solovyev V.V.Ratner V.A.
Enhancer-like structures in middle repetitive sequences of the
eukaryotic genomes.
Genetics (Russ),22, 357-368,(1986).
[2] Solovyev V.V., Kolchanov N.A. 1994,
Search for functional sites using consensus
In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim H.A.),
World Scientific, p.16-21.
[3] Heinemeyer, T., Chen, X., Karas, H., Kel, A. E., Kel, O. V., Liebich, I., Meinhardt, T., Reuter, I., Schacherer, F., Wingender, E. (1999).
Expanding the TRANSFAC database towards an expert system of regulatory olecular
Solovyev V.V. (2002) Structure, Properties and Computer Identification of Eukaryotic genes. In Bioinformatics from Genomes to Drugs. V.1. Basic Technologies. (ed. Lengauer T.), p. 59 - 111.