GPU clusters

Z MetaCentrum
Přejít na: navigace, hledání

(Česká verze)

To write GPU accelerated programs, one will need to be familiar with high-level programming languages. Most GPU programming is based on the C language and its extensions. In the wider context, having a background in parallel computing techniques (threading, message passing, vectorization) will help one understand and apply GPU acceleration.


GPU clusters in MetaCentrum
Cluster Nodes GPUs per node Compute Capability CuDNN gpu_cap=
doom.metacentrum.cz doom1.metacentrum.cz - doom30.metacentrum.cz 2x nVidia Tesla K20 5GB (aka Kepler) 3.5 YES cuda20,cuda35
konos.fav.zcu.cz konos1.fav.zcu.cz - konos8.fav.zcu.cz 4x GPU nVidia GeForce GTX 1080 Ti 3.5 YES cuda20,cuda35,cuda61
gram.zcu.cz gram1.zcu.cz - gram10.zcu.cz 4x nVidia Tesla M2090 6GB 2.0 No cuda20
zubat.ncbr.muni.cz zubat1.ncbr.muni.cz - zubat8.ncbr.muni.cz 2x nVidia Tesla K20Xm 6GB (aka Kepler) 3.5 YES cuda20,cuda35
glados.cerit-sc.cz glados10.cerit-sc.cz - glados16.cerit-sc.cz nVidia 1080Ti GPU 3.5 YES cuda20,cuda35,cuda61
glados.cerit-sc.cz glados17.cerit-sc.cz nVidia 1080Ti GPU 3.5 YES cuda20,cuda35,cuda61
glados.cerit-sc.cz glados1.cerit-sc.cz nVidia TITAN V GPU 3.5 YES cuda20,cuda35,cuda61,cuda70


zefron.cerit-sc.cz zefron8.cerit-sc.cz nVidia Tesla K40 3.5 YES cuda20,cuda35


Submiting GPU jobs

  • GPU queues: gpu (24 hours max) and gpu_long (both with open access for all MetaCentrum members)
  • GPU jobs on the konos cluster can be also run via the priority queue iti (queue for users from ITI - Institute of Theoretical Informatics, Univ. of West Bohemia)
  • zubat cluster is available for any job which will run 24 hours at most.
  • Users from CEITEC MU and NCBR can run jobs via privileged queues on the zubat cluster.

Requesting GPUs

The key scheduling constraint is to prevent jobs from sharing GPUs. To ensure this always use the gpu=X flag in qsub and request one of the gpu queues (gpu, gpu_long, iti).

-l select=1:ngpus=X -q gpu

where X means number of GPU cards required. By default

resources_default.gpu=1

If a job requires more GPU cards than it asks (or is available), prolog does not run it.

To plan your job on clusters with certain Compute Capability, use qsub command like this:

qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>

Example

qsub -q gpu -l select=1:ngpus=1 -I

Interactive job requests 1 machine and 1 GPU card in the max 24 hours queue.

FAQ

Q: How can I recognize which GPUs are reserved for me by planning system?

A: IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES variable. These IDs are mapped to CUDA tools virtual IDs. Though if CUDA_VISIBLE_DEVICES contains value 2, 3 then CUDA tools will report IDs 0, 1.

Q: I want to use the NVIDIA CuDNN library, which GPU clusters do support it?

A: Those which have GPU with Compute Capability > 3.0, which means doom and zubat clusters (see the table above)

Q: Where can I get more information about the GPU cards installed in a cluster?

A: Click the name of the cluster in the table above. Website with detailed info will appear