Search for potential splice sites

Using information about significant triplet frequencies in various functional parts of splice site regions, and preferences of octanucleotides in protein coding and intron regions, a combined linear discriminant recognition function was developed. The splice site prediction scheme gives an accuracy of donor site recognition on the test set 97% (correlation coefficient C=0.62) and 96% for acceptor splice sites (C=0.48). The method is a good alternative to neural network approach (Brunak et al.,Mol.Biol.,1991) that has C=0.61 with 95% accuracy of donor site prediction and C less 40 with 95% accuracy of acceptor site prediction. False positive rate for splice site prediction is relatively high - about one false positive per one true site for 97% accuracy of true sites prediction. More precise splice site positions might be found if you use programs of exons recognition (HEXON, FEXH) and gene structure prediction (FGENESH).