MolQuest

Test on-line:

Gene Finding

Gene Finding: Gene models construction, splice sites, protein-coding exons

The programs usage in Scientific publications

FGENESH is the fastest and most accurate ab initio gene prediction program available - for more details, see FGENESH help. Its variants that use similarity information: FGENESH+ (similar protein), FGENESH_C (similar cDNA), FEGENESH-2 (two homologous genomic sequences) greatly improve accuracy of gene prediction when such similarity information is available. These programs can be accessed here.

To find genes in Bacterial sequences click here.

Our two best gene finders cannot be accessed at our site due to computing resources limitations. These two are FGENESH++ (automated version of FGENESH+) and FGENESH++C, which maps known mRNA/EST sequences from RefSeq and then performs FGENESH++-like gene prediction, resulting in fully automatic annotation of quality similar to that of manual annotation.

FGENES, FGENES-M, and SPLM can be used on human sequences only.
BESTORF can be used for human, Drosophila and dicot plants.
SPL can be used for human, Drosophila, nematode, S.cerevisiae, and dicots.

FGENESH / HMM-based gene structure prediction (multiple genes, both chains) [Help]

FGENES / Pattern based human gene structure prediction (multiple genes, both chains) [Help]

FGENES-M / Pattern-based human multiple variants of gene structure prediction) [Help]

FGENESH-M / Prediction of multiple variants potential genes in genomic DNA [Example]

FGENESH_GC / HMM-based human gene prediction that allows donor splice site GC donor splice site structure [Help]

BESTORF / Finding potential coding fragment EST/mRNA [Help]

FEX / Finding potential 5'-, internal and 3'-coding exons [Help]

SPL / Search for potential splice sites [Help]

SPLM / Search for human potential splice sites using weight matrices [Help]

RNASPL / Search for exon-exon junction positions in cDNA [Help]

FSPLICE / find splice sites in genomic DNA [Help]

You can now view all human genes - both known and predicted by our newest FGENESH++C gene finder, - and also more than 20,000 human promoters predicted by our TSSW program, in Softberry Genome Explorer. October, 2000 release of human genome draft is shown with known and predicted genes, while December, 2000 release is shown with both genes and promoters.

Bacterial Promoter, Operon and Gene Finding

The programs usage in Scientific publications

fgenesB - Pattern/Markov chain-based bacterial operon and gene prediction [Help]

BPROM - Prediction of bacterial promoters [Help]

AbSplit - Separating archea and bacterial genomes [Help]

FindTerm - Finding Terminators in bacterial genomes [Help]

General scheme of bacterial genome annotation -(automatic pipeline - Fgenesb_annotator)

FGENESB is the fastest (E.coli genome is annotated in ~14 sec) and most accurate ab initio bacterial operon and gene prediction program available - for more details, see FGENESB help. It uses genome-specific parameters learned by FGENESB-train script, which requires only DNA sequence from genome of interest as an input. It automatically creates a file with gene prediction parameters for analyzed genome. It took only a few minutes to create such file for E.coli genome using its sequence. If you need parameters for your new bacteria, please contact Softberry - we can include them in the web list.

In current FGENESB version, complex operon prediction model is realized based on gene distances. It can recognize accurately 70% of single transcription units and define exactly about 50% of operons (~92% partially). Increasing accuracy of operon identification is done by using prediction of promoter and terminator and analyzing neighbor location of genes in many bacterial genomes.

We developed new FGENESB-Annotator script that finds similar proteins in public databases and annotates predicted genes. This script can also identify low scoring genes if they have known homologous protein. The script annotates CDS, Promoters, Terminators, Operons, tRNA and RRNA

The annotation can be produced in GenBank format (see example later) and exported to different Bacterial Genome Browsers such as Artemis (Sanger Center) or Softberry Genome Explorer

FGENESB-Annotator script includes possibility to atomatically annotate sets of sequences generated by sequencing some Bacterial Community scaffolds. To separate archebacterial sequences from bacterial sequences that required different gene finding parameters use ABsplit program.

Together, FGENESB gene finding program and Train and Annotator scripts costitute FGENESB pipeline - the most comprehensive tool for prokaryotic genome annotation. Description of the pipeline is given here.

EXAMPLE: Annotation of Bacillus anthracis A2012 main chromosome by FgenesB-Annotator script. ......... Annotation in GenBank format

Examples of annotation of operons and genes for other bacteria

Gene finding using similarity with EST, Protein or other genome sequence

The programs usage in Scientific publications

FGENESH+ / HMM plus similar protein-based gene prediction [Help]

Speed and Accuracy of Fgenesh+

PROT_MAP / mapping of a set of proteins on genome [Help]

FGENESH_C / HMM plus similar cDNA-based gene structure prediction (multiple genes, both chains) [Help]

FGENESH-2 / HMM gene prediction using two genomic sequences of close organisms (as Human and Mouse) [Help]

FGENESH+ and FGENESH_C programs can be used if there is a protein or cDNA/EST sequence similar to that of predicted gene. For example, you can run ab initio gene finding programs as FGENES or FGENESH and run BLASTP DB search with the predicted exons. Any true predicted exon can provide you with known similar protein, if such protein exists in DB. Take sequence of homologous protein and run FGENESH+. The accuracy of gene prediction can be up to 100%, depending on how similar the predicted and DB proteins are.

FGENESH-2 program can be used if there are sequences from two related organisms available, such as human and mouse. The program gives higher score to exons that have predicted amino acid sequences homologous to that of related organism's exons, which allows to substantially more accurate exon prediction and gene assembly.

FGENESH++ is our newest gene prediction program that works in following steps: (1) performs ab initio gene prediction using FGENESH algorithm; (2) runs predicted amino acid sequences of all potential exons through NR protein sequence database using DBSCAN-P engine; and (3) runs second round of gene prediction with higher scores assigned to exons homologous to known proteins.

The result is fully automated genome annotation of quality similar to manual annotation. The program is extremely fast - whole human genome is annotated in 20-30 hours on a machine like 500 MHz DEC Alpha. Due to computer resources constraints, we cannot make FGENESH++ available through the web site - you have to use FGENESH+ instead.

We can license FGENESH++ for use on any organism except human. Results of FGENESH++ annotation of human genome can be licensed from Biomax Informatics.

Gene Finding in Viral Genomes

The programs usage in Scientific publications

FGENESV algorithm is based on pattern recognition of different types of signals and Markov chain models of coding regions. Optimal combination of these features is then found by dynamic programming and a set of gene models is constructed along given sequence.

FGENESV is the fastest ab initio viral gene prediction program available.

We developed new FGENESV-Annotator script that finds similar proteins in public databases and annotates predicted genes. This script can also identify low scoring genes if they have known homologous protein.

As an example of using FGENESV, the annotation of SARS coronavirus TOR2 genome is presented:

Annotation of complete genome of the SARS associated Coronavirus FgenesV-Annotator script.

There are two variants of viral gene prediction program: FGENESV0, which is suited for small (<10 kb) genomes, uses generic parameters of coding regions, while FGENESV learns genome-specific parameters using viral genome sequence as an input.

FGENESV predicts all intronless viral genes. To find small group of genes that contain introns - normally alternative structures of intronless variants - standard eukaryotic gene finding programs, such as FGENESH, can be used in addition to FGENESV.

As additional parameters, you can choose Linear or Circular form of your virus and select alternative genetic code (Standard code is default): The Bacterial and Plant Plastid Code (transl_table=11) or The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code (transl_table=4).

fgenesV0 / Generic parameters Markov chain-based viral gene prediction [Help]

fgenesV / Trained Pattern/Markov chain-based viral gene prediction [Help]

Your use of Softberry programs signifies that you accept Terms of Use