LCRep

Program for mapping low complexity regions in nucleotide sequences.

Algorithm description

Search for the low complexity regions is performed with using Shannon's information measure. Shannon's information is defiened as follows:


where: {a1, ..., ak} is the alphabet of the size k, and P(ai) is a fractional composition of ai

The search is carried out as follows. For each position i of the sequence S calculation of the Shannon's information H(i, l) is performed in the window of size l within the range [lbegin, lend]. If H(i, l) turns out below prespecified threshold Hthr(l) then fragment [i, i+l] is declared low complex. Intersection of all such fragments at the end of calculation gives a map of low complexity regions of the sequence S.

Output examples


>c20
Masked regions:
p1:  90       p2:  115      l: 26        chain(+) [Low Complexity Region]
p1: 220       p2:  240      l: 23        chain(+) [Low Complexity Region]
....


>c20
GCCAAGAAGATATGTAGCATTAAGGTTTAGAATACAGGCTTTGAAGTCAAACAGACCAGAGTTAACAACCTCATTTTGTT
TTTATTTTCNNNNNNNNNNNNNNNNNNNNNNNNNNCTTTAAGTTCTAGGGTACATGTGCACAACGTGCAGGTTTGTTACA
TATGTATACATGTGCCATGTTGGTGTGCTGCACCCATTAACTGGACATTTACATTAGGTNNNNNNNNNNNNNNNNNNNNN
CCCTCCTCCCCTTACCCCACAACAGGCCCCGGTGTGTGATGTTCCCCTTCCTGTGTCCAAGTGTTCTCATTGTTCAGTTC
....


>c20
gccaagaagatatgtagcattaaggtttagaatacaggctttgaagtcaaacagaccagagttaacaacctcattttgtt
tttattttcTTTTTTAAAATTTTTTTAAAATTATActttaagttctagggtacatgtgcacaacgtgcaggtttgttaca
tatgtatacatgtgccatgttggtgtgctgcacccattaactggacatttacattaggtAAAAAAAAAAAAAAAAAAAAA
ccctcctccccttaccccacaacaggccccggtgtgtgatgttccccttcctgtgtccaagtgttctcattgttcagttc
....