CEGMA (Core Eukaryotic Genes Mapping Approach) is a pipeline for building a set of high reliable set of gene annotations in virtually any eukaryotic genome. The strategy relies on a simple fact: some highly conserved proteins are encoded in essentially all eukaryotic genomes. We use the KOGs database to build a set of these highly conserved ubiquitous proteins. We define a set of 458 core proteins, and the protocol, CEGMA, to find orthologs of the core proteins in new genomes and to determine their exon-intron structures.
The procedure uses information from the core genes of six model organisms by first using TBLASTN to identify candidate regions in a new genome. It then proposes and redefines gene structures using a combination of GeneWise, HMMER and geneid. The system includes the use of a profile for each core protein to ensure the reliability of the gene structure.
Version 2.5 Application is available for all users of MetaCentrum.
module add cegma-2.5
Cegma request acces to Blast+
module add blast+-2.2.29
Afterwards you can submit following command
You will see list of parametrs which you can use to run the appliacation.
If you use multiple cores specify the number of threads by option
--threads <number> . You can use variable $PBS_NUM_PPN to get the number of reserved CPU.
On-line documentation you can find on http://korflab.ucdavis.edu/Datasets/cegma Original article is on http://bioinformatics.oxfordjournals.org/content/23/9/1061.full.pdf+html
GNU GENERAL PUBLIC LICENSE