InGAP-CDG

Z MetaCentrum
Přejít na: navigace, hledání

Description

Currently, most gene prediction methods detect coding sequences (CDSs) from transcriptome assembly when lacking of closely related reference genomes. However, these methods are of limited application due to highly fragmented transcripts and extensive assembly errors, which may lead to redundant or false CDS predictions. Here we present a novel algorithm, inGAP-CDG, for effective construction of full-length and non-redundant CDSs from unassembled transcriptomes. inGAP-CDG achieves this by combining a newly developed codon-based de bruijn graph to simplify the assembly process and a machine learning based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits significantly increased predicted CDS length and robustness to sequencing errors and varied read length.

Availability

version 1.2 

Use

module add ingap-cdg-1.2

This makes available the path to the program binaries.

Documentation

     (1) inGAP-CDG_readToCDS   
            ./inGAP-CDG_readToCDS  [options]
        Options: 
         -i, --input_file             Please enter your input filename (in fasta format). [required]
         -o, --output_dir             Please enter your output directory filename. If not exists, the program will create it.  
         -n, --threads_num            The thread number supported by openmp. [default: 1]
         -L, --train_seq_len          The minimal length of CDSs in positive data set for SVM. [default: 1000] 
         -l, --potential_ORFs_cutoff  The potential ORFs that are larger than --potential_ORFs_cutoff*read_length will be kept. [default: 0.8] 
         -d, --svm_dev                The SVM classification vector value to filter false positive ORFs. [default: 0] [-0.1, 0.1]
         -k, --kmer_length            The kmer size (a triple number) used to construct codon-based de Bruijn graph. [default: 27] 
         -p, —-subgraph_size          The minimal number of subgraph size used to traverse. [default: 300]
         -t, --tips_length            The cutoff length of tips to be trimmed in de Bruijn graph. [default: 2*kmer_length]
         -h, --help                   Display the help information for options.
    (2) inGAP-CDG_transcriptToCDS
            ./inGAP-CDG_transcriptToCDS  [options]
        Options: 
         -i, --input_file             Please enter your input filename (in fasta format). [required]
         -o, --output_dir             Please enter your output directory filename. If not exists, the program will create it. 
         -n, --threads_num            The thread number supported by openmp. [default: 1]
         -L, --train_seq_len          The minimal length of CDSs in positive data set for SVM. [default: 1500] 
         -l, --test_seq_len           The length of test sequence for SVM. [default: 100]
         -d, --svm_dev                The SVM classification vector value to filter false positive ORFs. [default: 0] [-0.1, 0.1]
         -k, --kmer_length            The kmer size (a triple number) used to construct codon-based de Bruijn graph. [default: 27] 
         -p, —-subgraph_size          The minimal number of subgraph size used to traverse. [default: 300]
         -t, --tips_length            The cutoff length of tips to be trimmed in de Bruijn graph. [default: 2*kmer_length]
         -h, --help                   Display the help information for options.

Licence

 GNU General Public License

Program manager

meta@cesnet.cz

Homepage

https://sourceforge.net/projects/ingap-cdg/