CEGMA

Z MetaCentrum
Přejít na: navigace, hledání


Popis

CEGMA (Core Eukaryotic Genes Mapping Approach) is a pipeline for building a set of high reliable set of gene annotations in virtually any eukaryotic genome. The strategy relies on a simple fact: some highly conserved proteins are encoded in essentially all eukaryotic genomes. We use the KOGs database to build a set of these highly conserved ubiquitous proteins. We define a set of 458 core proteins, and the protocol, CEGMA, to find orthologs of the core proteins in new genomes and to determine their exon-intron structures.

The procedure uses information from the core genes of six model organisms by first using TBLASTN to identify candidate regions in a new genome. It then proposes and redefines gene structures using a combination of GeneWise, HMMER and geneid. The system includes the use of a profile for each core protein to ensure the reliability of the gene structure.

Availability

Version 2.5 Application is available for all users of MetaCentrum.

Use

Load module:

module add cegma-2.5

Cegma request acces to Blast+

module add blast+-2.2.29

Afterwards you can submit following command

cegma

You will see list of parametrs which you can use to run the appliacation.

If you use multiple cores specify the number of threads by option --threads <number> . You can use variable $PBS_NUM_PPN to get the number of reserved CPU.

--threads $PBS_NUM_PPN

Documentation

On-line documentation you can find on http://korflab.ucdavis.edu/Datasets/cegma Original article is on http://bioinformatics.oxfordjournals.org/content/23/9/1061.full.pdf+html

Licence

GNU GENERAL PUBLIC LICENSE

Program manager

meta@cesnet.cz

Homepage

http://korflab.ucdavis.edu/Datasets/cegma