PSF


Input
Nucleotide sequence
Nucleotide FASTA-file with a single genomic sequence (without gaps).
Protein set
MultiFASTA-file with protein sequences, without gaps. Headers can include additional information in Softberry AbInitio or FGENESH++ format. Here IPI or NR database could be given on input.
Output
Output file Specially formatted file with the pseudogenes descriptions.

Fields are separated with '@@' sequence.

List of fields:

chr chromosome (or another sequence) name is which search has been carried out
chain chain
pos(dir.ch.) (nt.) pseudogene start position (in direct chain)
len(nt.) (nt.) pseudogene length. Note thate pseudogene lies from the right of 'pos(dir.ch)'
identity (%) Identity with a protein (0...100%).
coverage (%) Coverage of a protein with alignment
Ka/Ks ratio calulated by Nei-Gojobori method
uali.head (yes/no) first codon of alignment is ATG
uali.tail (yes/no) last codon of alignment is stop-codon
exons#,lower number of exons, lower estimation
exons#,upper number of exons, upper estimation
polyA (yes/no) there is a polyA tail at the 3' terminus of alignment
polyA_signal (yes/no) there is a polyA signal at the 3' terminus of alignment
corr.stops# number of correctable (by one mismatch) in-frame stop codons
uncorr.stops# number of uncorrectable (by one mismatch) in-frame stop codons
corr.frameshifts# number of correctable (by one-nucleotide instertion/deletion) frameshifts
uncorr.frameshifts# number of incorrectable (by one-nucleotide instertion/deletion) frameshifts
prototype_chr chromosome of prototype protein gene
prototype_prot_name prototype protein gene name
prototype_exon#,lower number of exons of prototype prot. gene, lower estimation
prototype_exon#,upper number of exons of prototype prot. gene, upper estimation
DNA_identity Identity between prototype gene and pseudogene at the level of DNA
CDS length (nt.) CDS length