Description

The NVIDIA® CUDA® Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.

There are several machines in MetaCentrum with computing graphics cards. Maybe you will need to know their "Cuda computing capabilities". You can find them in following table:

(Česká verze)

Metacentrum wiki is deprecated after March 2023
Dear users, due to integration of Metacentrum into https://www.e-infra.cz/en (e-INFRA CZ service), the documentation for users will change format and site.
The current wiki pages won't be updated after end of March 2023. They will, however, be kept for a few months for backwards reference.
The new documentation resides at https://docs.metacentrum.cz.

To write GPU accelerated programs, one will need to be familiar with high-level programming languages. Most GPU programming is based on the C language and its extensions. In the wider context, having a background in parallel computing techniques (threading, message passing, vectorization) will help one understand and apply GPU acceleration.

GPU clusters in MetaCentrum
Cluster	Nodes	GPUs per node	Memory MiB	compute capability	CuDNN	gpu_cap=
galdor.metacentrum.cz	galdor1.metacentrum.cz - galdor20.metacentrum.cz	4x A40	45 634	8.6	YES	cuda35,cuda61,cuda75,cuda80,cuda86
luna2022.fzu.cz	luna201.fzu.cz - luna206.fzu.cz	1x A40	45 634	8.6	YES	cuda35,cuda61,cuda75,cuda80,cuda86
fer.natur.cuni.cz	fer1.natur.cuni.cz - fer3.natur.cuni.cz	8x RTX A4000	16 117	8.6	YES	cuda35,cuda61,cuda75,cuda80,cuda86
zefron.cerit-sc.cz	zefron6.cerit-sc.cz	1x A10	22 731	8.6	YES	cuda35,cuda61,cuda75,cuda80,cuda86
zia.cerit-sc.cz	zia1.cerit-sc.cz - zia5.cerit-sc.cz	4x A100	40 536	8.0	YES	cuda35,cuda61,cuda75,cuda80
fau.natur.cuni.cz	fau1.natur.cuni.cz - fau3.natur.cuni.cz	8x Quadro RTX 5000	16 125	7.5	YES	cuda35,cuda61,cuda75
cha.natur.cuni.cz	cha.natur.cuni.cz	8x GeForce RTX 2080 Ti	11 019	7.5	YES	cuda35,cuda61,cuda75
gita.cerit-sc.cz	gita1.cerit-sc.cz - gita7.cerit-sc.cz	2x GeForce RTX 2080 Ti	11 019	7.5	YES	cuda35,cuda61,cuda75
adan.grid.cesnet.cz	adan1.grid.cesnet.cz - adan61.grid.cesnet.cz	2x Tesla T4	15 109	7.5	YES	cuda35,cuda61,cuda75
glados.cerit-sc.cz	glados2.cerit-sc.cz - glados7.cerit-sc.cz	2x GeForce RTX 2080	7 982	7.5	YES	cuda35,cuda61,cuda75
glados.cerit-sc.cz	glados1.cerit-sc.cz	1x TITAN V GPU	12 066	7.0	YES	cuda35,cuda61,cuda70
konos.fav.zcu.cz	konos1.fav.zcu.cz - konos8.fav.zcu.cz	4x GeForce GTX 1080 Ti	11 178	6.1	YES	cuda35,cuda61
glados.cerit-sc.cz	glados10.cerit-sc.cz - glados13.cerit-sc.cz	2x 1080Ti GPU	11 178	6.1	YES	cuda35,cuda61
zefron.cerit-sc.cz	zefron7.cerit-sc.cz	1x GeForce GTX 1070	8 119	3.5	YES	cuda35, cuda61
black1.cerit-sc.cz	black1.cerit-sc.cz	4x Tesla P100	16 280	6.0	YES	cuda35, cuda60
grimbold.metacentrum.cz	grimbold.metacentrum.cz	2x Tesla P100	12 198	6.0	YES	cuda35, cuda60
zefron.cerit-sc.cz	zefron8.cerit-sc.cz	1x Tesla K40c	11 441	3.5	YES	cuda35

Submitting GPU jobs

GPU queues: gpu (24 hours max) and gpu_long (up to 336 hours), both with open access for all MetaCentrum members
GPU jobs on the konos cluster can be also run via the priority queue iti (queue for users from ITI - Institute of Theoretical Informatics, Univ. of West Bohemia)
zubat cluster is available for any job which will run 24 hours at most.
Users from CEITEC MU and NCBR can run jobs via privileged queues on the zubat cluster.

The current version of the cuda drivers (parameter cuda_version) can be verified interactively in the qsub command assembler.

Requesting GPUs

The key scheduling constraint is to prevent jobs from sharing GPUs. To ensure this always use the gpu=X flag in qsub and request one of the gpu queues (gpu, gpu_long, iti).

qsub -l select=1:ncpus=1:mem=10gb:ngpus=X -q gpu

where X means a number of GPU cards required. By default

resources_default.gpu=1

If a job requires more GPU cards than it asks (or is available), prolog does not run it.

To plan your job on clusters with certain compute capability, use qsub command like this:

qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>

Using the PBS parameter gpu_mem is possible to specify the minimum amount of memory that the GPU card will have.

qsub -q gpu -l select=1:ncpus=1:ngpus=1:gpu_mem=10gb ...

Example

qsub -I -q gpu -l select=1:ncpus=1:ngpus=1:scratch_local=10gb:gpu_mem=10gb -l walltime=24:0:0

Interactive job requests 1 machine, 1 CPU and 1 GPU card for 24 hours.

FAQ

Q: How can I recognize which GPUs are reserved for me by planning system?

A: IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES variable. These IDs are mapped to CUDA tools virtual IDs. Though if CUDA_VISIBLE_DEVICES contains value 2, 3 then CUDA tools will report IDs 0, 1.

Q: I want to use the NVIDIA CuDNN library, which GPU clusters do support it?

A: Those which have GPU with compute capability > 3.0, which means all clusters (see the table above)

License

Own.

To use Cuda, you will also need to accept license for cuDNN library. You will find the link on https://metavo.metacentrum.cz/cs/myaccount/licence.html.

Use

Upcoming modulesystem change alert!

Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.

You can test the new version now by adding a line

source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules

to your script before loading a module. Then, you can list all versions of cuda and load default version of cuda as

module avail cuda/ # list available modules module load cuda # load (default) module

If you wish to keep up to the current system, it is still possible. Simply list all modules by

module avail cuda

and choose explicit version you want to use.

Tip: Think about the use of cuDNN library in addition to CUDA software.

Documentation

http://docs.nvidia.com/cuda/#axzz3uZeYiz1h

Homepage

https://developer.nvidia.com/cuda-toolkit

Cuda (Nvidia)

Obsah

Description

Submitting GPU jobs

Requesting GPUs

Example

FAQ

License

Use

Documentation

Homepage

Navigační menu

Cuda (Nvidia)

Description

Submitting GPU jobs

Requesting GPUs

Example

FAQ

License

Use

Documentation

Homepage

Navigační menu

Hledat