InGAP-CDG

From MetaCentrum
Jump to navigation Jump to search

Description

Currently, most gene prediction methods detect coding sequences (CDSs) from transcriptome assembly when lacking of closely related reference genomes. However, these methods are of limited application due to highly fragmented transcripts and extensive assembly errors, which may lead to redundant or false CDS predictions. Here we present a novel algorithm, inGAP-CDG, for effective construction of full-length and non-redundant CDSs from unassembled transcriptomes. inGAP-CDG achieves this by combining a newly developed codon-based de bruijn graph to simplify the assembly process and a machine learning based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits significantly increased predicted CDS length and robustness to sequencing errors and varied read length.

License

GNU General Public License

Usage

Upcoming modulesystem change alert!

Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.

You can test the new version now by adding a line

source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules

to your script before loading a module. Then, you can list all versions of ingap-cdg and load default version of ingap-cdg as

module avail ingap-cdg/ # list available modules
module load ingap-cdg   # load (default) module


If you wish to keep up to the current system, it is still possible. Simply list all modules by

module avail ingap-cdg

and choose explicit version you want to use.

Documentation

     (1) inGAP-CDG_readToCDS   
            ./inGAP-CDG_readToCDS  [options]
        Options: 
         -i, --input_file             Please enter your input filename (in fasta format). [required]
         -o, --output_dir             Please enter your output directory filename. If not exists, the program will create it.  
         -n, --threads_num            The thread number supported by openmp. [default: 1]
         -L, --train_seq_len          The minimal length of CDSs in positive data set for SVM. [default: 1000] 
         -l, --potential_ORFs_cutoff  The potential ORFs that are larger than --potential_ORFs_cutoff*read_length will be kept. [default: 0.8] 
         -d, --svm_dev                The SVM classification vector value to filter false positive ORFs. [default: 0] [-0.1, 0.1]
         -k, --kmer_length            The kmer size (a triple number) used to construct codon-based de Bruijn graph. [default: 27] 
         -p, —-subgraph_size          The minimal number of subgraph size used to traverse. [default: 300]
         -t, --tips_length            The cutoff length of tips to be trimmed in de Bruijn graph. [default: 2*kmer_length]
         -h, --help                   Display the help information for options.
    (2) inGAP-CDG_transcriptToCDS
            ./inGAP-CDG_transcriptToCDS  [options]
        Options: 
         -i, --input_file             Please enter your input filename (in fasta format). [required]
         -o, --output_dir             Please enter your output directory filename. If not exists, the program will create it. 
         -n, --threads_num            The thread number supported by openmp. [default: 1]
         -L, --train_seq_len          The minimal length of CDSs in positive data set for SVM. [default: 1500] 
         -l, --test_seq_len           The length of test sequence for SVM. [default: 100]
         -d, --svm_dev                The SVM classification vector value to filter false positive ORFs. [default: 0] [-0.1, 0.1]
         -k, --kmer_length            The kmer size (a triple number) used to construct codon-based de Bruijn graph. [default: 27] 
         -p, —-subgraph_size          The minimal number of subgraph size used to traverse. [default: 300]
         -t, --tips_length            The cutoff length of tips to be trimmed in de Bruijn graph. [default: 2*kmer_length]
         -h, --help                   Display the help information for options.

Homepage

https://sourceforge.net/projects/ingap-cdg/