CuDNN library
Description
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.
License
You have to be registered in NVIDIA Accelerated Computing Developer Program and agree with their licence. Then confirm the licence form
Notice: This is a licenced software. If you want to use it, you must confirm the licence form. But first you have to accept a licence at NVIDIA's site.
Notice 2: These are the standalone modules. Usually you need to use it with some of CUDA modules.
Usage
Upcoming modulesystem change alert!
Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.
You can test the new version now by adding a line
source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules
to your script before loading a module. Then, you can list all versions of cudnn and load default version of cudnn as
module avail cudnn/ # list available modules module load cudnn # load (default) module
If you wish to keep up to the current system, it is still possible. Simply list all modules by
module avail cudnn
and choose explicit version you want to use.
To plan your job on clusters with certain Compute Capability, use qsub command like this:
qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>
Supporting GPU clusters
CuDNN only works on GPUs with high enough computing capabilities. In this table, you can see information about individual GPU clusters and if their GPUs support CuDNN library:
Metacentrum wiki is deprecated after March 2023
Dear users, due to integration of Metacentrum into https://www.e-infra.cz/en (e-INFRA CZ service), the documentation for users will change format and site. The current wiki pages won't be updated after end of March 2023. They will, however, be kept for a few months for backwards reference. The new documentation resides at https://docs.metacentrum.cz. |
To write GPU accelerated programs, one will need to be familiar with high-level programming languages. Most GPU programming is based on the C language and its extensions. In the wider context, having a background in parallel computing techniques (threading, message passing, vectorization) will help one understand and apply GPU acceleration.
GPU clusters in MetaCentrum | |||||||
---|---|---|---|---|---|---|---|
Cluster | Nodes | GPUs per node | Memory MiB | compute capability | CuDNN | gpu_cap= | |
galdor.metacentrum.cz | galdor1.metacentrum.cz - galdor20.metacentrum.cz | 4x A40 | 45 634 | 8.6 | YES | cuda35,cuda61,cuda75,cuda80,cuda86 | |
luna2022.fzu.cz | luna201.fzu.cz - luna206.fzu.cz | 1x A40 | 45 634 | 8.6 | YES | cuda35,cuda61,cuda75,cuda80,cuda86 | |
fer.natur.cuni.cz | fer1.natur.cuni.cz - fer3.natur.cuni.cz | 8x RTX A4000 | 16 117 | 8.6 | YES | cuda35,cuda61,cuda75,cuda80,cuda86 | |
zefron.cerit-sc.cz | zefron6.cerit-sc.cz | 1x A10 | 22 731 | 8.6 | YES | cuda35,cuda61,cuda75,cuda80,cuda86 | |
zia.cerit-sc.cz | zia1.cerit-sc.cz - zia5.cerit-sc.cz | 4x A100 | 40 536 | 8.0 | YES | cuda35,cuda61,cuda75,cuda80 | |
fau.natur.cuni.cz | fau1.natur.cuni.cz - fau3.natur.cuni.cz | 8x Quadro RTX 5000 | 16 125 | 7.5 | YES | cuda35,cuda61,cuda75 | |
cha.natur.cuni.cz | cha.natur.cuni.cz | 8x GeForce RTX 2080 Ti | 11 019 | 7.5 | YES | cuda35,cuda61,cuda75 | |
gita.cerit-sc.cz | gita1.cerit-sc.cz - gita7.cerit-sc.cz | 2x GeForce RTX 2080 Ti | 11 019 | 7.5 | YES | cuda35,cuda61,cuda75 | |
adan.grid.cesnet.cz | adan1.grid.cesnet.cz - adan61.grid.cesnet.cz | 2x Tesla T4 | 15 109 | 7.5 | YES | cuda35,cuda61,cuda75 | |
glados.cerit-sc.cz | glados2.cerit-sc.cz - glados7.cerit-sc.cz | 2x GeForce RTX 2080 | 7 982 | 7.5 | YES | cuda35,cuda61,cuda75 | |
glados.cerit-sc.cz | glados1.cerit-sc.cz | 1x TITAN V GPU | 12 066 | 7.0 | YES | cuda35,cuda61,cuda70 | |
konos.fav.zcu.cz | konos1.fav.zcu.cz - konos8.fav.zcu.cz | 4x GeForce GTX 1080 Ti | 11 178 | 6.1 | YES | cuda35,cuda61 | |
glados.cerit-sc.cz | glados10.cerit-sc.cz - glados13.cerit-sc.cz | 2x 1080Ti GPU | 11 178 | 6.1 | YES | cuda35,cuda61 | |
zefron.cerit-sc.cz | zefron7.cerit-sc.cz | 1x GeForce GTX 1070 | 8 119 | 3.5 | YES | cuda35, cuda61 | |
black1.cerit-sc.cz | black1.cerit-sc.cz | 4x Tesla P100 | 16 280 | 6.0 | YES | cuda35, cuda60 | |
grimbold.metacentrum.cz | grimbold.metacentrum.cz | 2x Tesla P100 | 12 198 | 6.0 | YES | cuda35, cuda60 | |
zefron.cerit-sc.cz | zefron8.cerit-sc.cz | 1x Tesla K40c | 11 441 | 3.5 | YES | cuda35 |
Submitting GPU jobs
- GPU queues: gpu (24 hours max) and gpu_long (up to 336 hours), both with open access for all MetaCentrum members
- GPU jobs on the konos cluster can be also run via the priority queue iti (queue for users from ITI - Institute of Theoretical Informatics, Univ. of West Bohemia)
- zubat cluster is available for any job which will run 24 hours at most.
- Users from CEITEC MU and NCBR can run jobs via privileged queues on the zubat cluster.
- The current version of the cuda drivers (parameter cuda_version) can be verified interactively in the qsub command assembler.
Requesting GPUs
The key scheduling constraint is to prevent jobs from sharing GPUs. To ensure this always use the gpu=X flag in qsub and request one of the gpu queues (gpu, gpu_long, iti).
qsub -l select=1:ncpus=1:mem=10gb:ngpus=X -q gpu
where X means a number of GPU cards required. By default
resources_default.gpu=1
If a job requires more GPU cards than it asks (or is available), prolog does not run it.
To plan your job on clusters with certain compute capability, use qsub command like this:
qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>
Using the PBS parameter gpu_mem
is possible to specify the minimum amount of memory that the GPU card will have.
qsub -q gpu -l select=1:ncpus=1:ngpus=1:gpu_mem=10gb ...
Example
qsub -I -q gpu -l select=1:ncpus=1:ngpus=1:scratch_local=10gb:gpu_mem=10gb -l walltime=24:0:0
Interactive job requests 1 machine, 1 CPU and 1 GPU card for 24 hours.
FAQ
Q: How can I recognize which GPUs are reserved for me by planning system?
A: IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES
variable. These IDs are mapped to CUDA tools virtual IDs. Though if CUDA_VISIBLE_DEVICES
contains value 2, 3 then CUDA tools will report IDs 0, 1.
Q: I want to use the NVIDIA CuDNN library, which GPU clusters do support it?
A: Those which have GPU with compute capability > 3.0
, which means all clusters (see the table above)
Documentation
https://developer.nvidia.com/cudnn