CuDNN library

From MetaCentrum
Jump to navigation Jump to search


The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.


Development tools and environments


Versions and modules:

cudnn-7.4.2-cuda10         (for CUDA 10.0)
cudnn-7.1.4-cuda90         (for CUDA 9.0)
cudnn-7.0                  (for CUDA 8.0 and later)
cudnn-6.0                  (for CUDA 7.5 and later)
cudnn-5.1                  (for CUDA 7.5 and later)
cudnn-5.0                  (for CUDA 7.5 and later)
cudnn-4.0                  (for CUDA 7.0 and later)

Notice: This is a licenced software. If you want to use it, you must confirm the licence form. But first you have to accept a licence at NVIDIA's site.
Notice 2: These are the standalone modules. Usually you need to use it with some of CUDA modules.

Supporting GPU clusters

CuDNN only works on GPUs with high enough computing capabilities. In this table, you can see information about individual GPU clusters and if their GPUs support CuDNN library:

(Česká verze)

To write GPU accelerated programs, one will need to be familiar with high-level programming languages. Most GPU programming is based on the C language and its extensions. In the wider context, having a background in parallel computing techniques (threading, message passing, vectorization) will help one understand and apply GPU acceleration.

GPU clusters in MetaCentrum
Cluster Nodes GPUs per node Compute Capability CuDNN gpu_cap= - 2x nVidia Tesla T4 16GB 7.5 YES cuda35,cuda61,cuda75 - 2x nVidia Tesla K20 5GB (aka Kepler) 3.5 YES cuda35 - 4x GPU nVidia GeForce GTX 1080 Ti 6.1 YES cuda35,cuda61 - 2x nVidia Tesla K20Xm 6GB (aka Kepler) 3.5 YES cuda35 - nVidia 1080Ti GPU 6.1 YES cuda35,cuda61 nVidia 1080Ti GPU 6.1 YES cuda35,cuda61 nVidia TITAN V GPU 7.0 YES cuda35,cuda61,cuda70 nVidia Tesla K40 3.5 YES cuda35 nVidia Tesla P100 6.0 YES cuda35, cuda60 2x nVidia Tesla P100 6.0 YES cuda35, cuda60

Submiting GPU jobs

  • GPU queues: gpu (24 hours max) and gpu_long (both with open access for all MetaCentrum members)
  • GPU jobs on the konos cluster can be also run via the priority queue iti (queue for users from ITI - Institute of Theoretical Informatics, Univ. of West Bohemia)
  • zubat cluster is available for any job which will run 24 hours at most.
  • Users from CEITEC MU and NCBR can run jobs via privileged queues on the zubat cluster.

Requesting GPUs

The key scheduling constraint is to prevent jobs from sharing GPUs. To ensure this always use the gpu=X flag in qsub and request one of the gpu queues (gpu, gpu_long, iti).

-l select=1:ngpus=X -q gpu

where X means number of GPU cards required. By default


If a job requires more GPU cards than it asks (or is available), prolog does not run it.

To plan your job on clusters with certain Compute Capability, use qsub command like this:

qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>


qsub -q gpu -l select=1:ngpus=1 -I

Interactive job requests 1 machine and 1 GPU card in the max 24 hours queue.


Q: How can I recognize which GPUs are reserved for me by planning system?

A: IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES variable. These IDs are mapped to CUDA tools virtual IDs. Though if CUDA_VISIBLE_DEVICES contains value 2, 3 then CUDA tools will report IDs 0, 1.

Q: I want to use the NVIDIA CuDNN library, which GPU clusters do support it?

A: Those which have GPU with Compute Capability > 3.0, which means doom and zubat clusters (see the table above)

Q: Where can I get more information about the GPU cards installed in a cluster?

A: Click the name of the cluster in the table above. Website with detailed info will appear


You have to be registered in NVIDIA Accelerated Computing Developer Program and agree with their licence. Then confirm the licence form


module load cudnn-7.0

To plan your job on clusters with certain Compute Capability, use qsub command like this:

qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>



Program manager