PSF parameters

PSF

Input

Nucleotide sequence

Nucleotide FASTA-file with a single genomic sequence (without gaps).

Protein set

MultiFASTA-file with protein sequences, without gaps. Headers can include additional information in Softberry AbInitio or FGENESH++ format. Here IPI or NR database could be given on input.

Output

Output file

Specially formatted file with the pseudogenes descriptions.

Fields are separated with '@@' sequence.

List of fields:

chr	chromosome (or another sequence) name is which search has been carried out
chain	chain
pos(dir.ch.)	(nt.) pseudogene start position (in direct chain)
len(nt.)	(nt.) pseudogene length. Note thate pseudogene lies from the right of 'pos(dir.ch)'
identity	(%) Identity with a protein (0...100%).
coverage	(%) Coverage of a protein with alignment
Ka/Ks	ratio calulated by Nei-Gojobori method
uali.head	(yes/no) first codon of alignment is ATG
uali.tail	(yes/no) last codon of alignment is stop-codon
exons#,lower	number of exons, lower estimation
exons#,upper	number of exons, upper estimation
polyA	(yes/no) there is a polyA tail at the 3' terminus of alignment
polyA_signal	(yes/no) there is a polyA signal at the 3' terminus of alignment
corr.stops#	number of correctable (by one mismatch) in-frame stop codons
uncorr.stops#	number of uncorrectable (by one mismatch) in-frame stop codons
corr.frameshifts#	number of correctable (by one-nucleotide instertion/deletion) frameshifts
uncorr.frameshifts#	number of incorrectable (by one-nucleotide instertion/deletion) frameshifts
prototype_chr	chromosome of prototype protein gene
prototype_prot_name	prototype protein gene name
prototype_exon#,lower	number of exons of prototype prot. gene, lower estimation
prototype_exon#,upper	number of exons of prototype prot. gene, upper estimation
DNA_identity	Identity between prototype gene and pseudogene at the level of DNA
CDS length	(nt.) CDS length