AlphaFold

From MetaCentrum
Jump to navigation Jump to search

Description

This package provides an implementation of the inference pipeline of AlphaFold v2.1.1 This is a completely new model that was entered in CASP14 and published in Nature. For simplicity, we refer to this model as AlphaFold throughout the rest of this document. https://github.com/deepmind/alphafold

Licence

CUDA licence.

To use Alphafold with CUDA, you will also need to accept license for cuDNN library. You will find the link on https://metavo.metacentrum.cz/cs/myaccount/licence.html.

https://github.com/deepmind/alphafold#license-and-disclaimer

Usage

Application is prepared as singularity image and downloaded datasets with example scripts to run with PBS in Metacentrum modules. Location:

  • Location: /storage/brno11-elixir/projects/alphafold

There are four models with two speed/quality tradeoff. All combinations of these parameters are prepared in example scripts in Metacentrum.

Models

You can control which AlphaFold model to run by adding the --model_preset= flag. We provide the following models:

monomer: This is the original model used at CASP14 with no ensembling.

monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).

monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwise confidence measure. It is slightly less accurate than the normal monomer model.

multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.

Speed/Quality

You can control MSA speed/quality tradeoff by adding --db_preset=reduced_dbs or --db_preset=full_dbs to the run command. We provide the following presets:

reduced_dbs: This preset is optimized for speed and lower hardware requirements. It runs with a reduced version of the BFD database. It requires 8 CPU cores (vCPUs), 8 GB of RAM, and 600 GB of disk space.

full_dbs: This runs with all genetic databases used at CASP14.

Tips and useful information

Example of run with different models and speed/quality tradeoff with example file seq.fasta, multimer with multi.fasta. This test shows difference of RAM consuming a length of run with the same test. Results form one run, data could be updated.

model speed/quality RAM Duration cluster - GPU
monomer full_dbs 185 GB 28 min glados - RTX2080 - 8GB
monomer reduced_dbs 153 GB 36 min glados - RTX2080 - 8GB
monomer_casp14 full_dbs 197 GB 35 min zia - A100 - 40GB
monomer_casp14 reduced_dbs 38 GB 36 min zia - A100 - 40GB
monomer_ptm full_dbs 190 GB 36 min gita - RTX2080 Ti - 11GB
monomer_ptm reduced_dbs 55 GB 32 min gita - RTX2080 Ti - 11GB
multimer full_dbs 119 GB 75 min zia - A100 - 40GB
multimer reduced_dbs 40 GB 73 min zia - A100 - 40GB


Documentation

URL: https://github.com/deepmind/alphafold

Homepage

URL: https://github.com/deepmind/alphafold