CEGMA

From MetaCentrum
Jump to navigation Jump to search


Description

CEGMA (Core Eukaryotic Genes Mapping Approach) is a pipeline for building a set of high reliable set of gene annotations in virtually any eukaryotic genome. The strategy relies on a simple fact: some highly conserved proteins are encoded in essentially all eukaryotic genomes. We use the KOGs database to build a set of these highly conserved ubiquitous proteins. We define a set of 458 core proteins, and the protocol, CEGMA, to find orthologs of the core proteins in new genomes and to determine their exon-intron structures.

The procedure uses information from the core genes of six model organisms by first using TBLASTN to identify candidate regions in a new genome. It then proposes and redefines gene structures using a combination of GeneWise, HMMER and geneid. The system includes the use of a profile for each core protein to ensure the reliability of the gene structure.

License

GNU GENERAL PUBLIC LICENSE

Usage

Upcoming modulesystem change alert!

Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.

You can test the new version now by adding a line

source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules

to your script before loading a module. Then, you can list all versions of cegma and load default version of cegma as

module avail cegma/ # list available modules
module load cegma   # load (default) module


If you wish to keep up to the current system, it is still possible. Simply list all modules by

module avail cegma

and choose explicit version you want to use.

Cegma request acces to Blast+

module avail blast

Afterwards you can submit following command

cegma

You will see list of parametrs which you can use to run the appliacation.

If you use multiple cores specify the number of threads by option --threads <number> . You can use variable $PBS_NUM_PPN to get the number of reserved CPU.

--threads $PBS_NUM_PPN

Documentation

On-line documentation you can find on http://korflab.ucdavis.edu/Datasets/cegma Original article is on http://bioinformatics.oxfordjournals.org/content/23/9/1061.full.pdf+html

Homepage

http://korflab.ucdavis.edu/Datasets/cegma