GenomeAnalysisTK (GATK)

Z MetaCentrum
Přejít na: navigace, hledání

Description

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Availability

Versions 2.7-2, 3.7, 3.8-0 and 4.0.1.0 are freely available to users.

Available modules:

  • gatk-2.7.2
  • gatk-3.7
  • gatk-3.8-0
  • gatk-4.0.1.0

Licence

BSD 3-clause "New" or "Revised" license

Use

Version 4.0.1.0

First, you have to prepare your environment by executing:

 module load gatk-4.0.1.0

GATK 4 version has a wrapper script gatk, which significantly simplifies commands. Now, you can just run

  gatk --help    # to print help
  gatk --list    # to list of all available tools inside the toolkit

to get a list of all available tools in the toolkit. This is the basic structure of invocation of a tool named ToolName:

  gatk [--java-options "jvm args like -Xmx4G go here"] ToolName [GATK args go here]

This is how a command might look like in real world:

  gatk -Xmx8G HaplotypeCaller -R reference.fasta -I input.bam -O output.vcf

If you are not familiar with this syntax, please see the official Quick Start tutorial.

Version 3.8-0 and older

Example of environment initialization:

module add gatk-2.7.2

or

module add gatk-3.7

or

module add gatk-3.8-0

Initialization makes available also java 7 (or java 8 for version 3.7 and 3.8) and system variable $GATK pointing into GATK install dir. Usage of one of the tools with sample data (not for version 3.8-0):

java -Xmx2g -jar $GATK/GenomeAnalysisTK.jar -T CountReads -R $GATK/resources/exampleFASTA.fasta -I $GATK/resources/exampleBAM.bam

During large data processing, some problems with size of tmp directory can occurs (and can lead to the end of job or significant slowdown). In this case, add parameter -Djava.io.tmpdir=$SCRATCHDIR/tmp into java command.

List of tools and version check:

java -Xmx2g -jar $GATK/GenomeAnalysisTK.jar --help
java -Xmx2g -jar $GATK/GenomeAnalysisTK.jar --version

Documentation

Dokumentation is available at http://www.broadinstitute.org/gatk/guide/ .

Program manager

meta@cesnet.cz

Homepage

http://www.broadinstitute.org/gatk