Cluster Adan

Z MetaCentrum
Skočit na navigaci Skočit na vyhledávání
Metacentrum wiki is deprecated after March 2023
Dear users, due to integration of Metacentrum into https://www.e-infra.cz/en (e-INFRA CZ service), the documentation for users will change format and site.
The current wiki pages won't be updated after end of March 2023. They will, however, be kept for a few months for backwards reference.
The new documentation resides at https://docs.metacentrum.cz.

Adan (adan.grid.cesnet.cz) is a new GPU cluster adan.grid.cesnet.cz dedicated primarily to machine learning, deep learning, AI.


Hardware specification

Cluster adan[1-61].grid.cesnet.cz has 61 nodes, each with

  • 32 CPUs Intel Xeon Gold 5218 2.30GHz
  • 192 GiB of RAM
  • disk: 4x240 GB SSD (scratch_ssd)
  • 2 GPUs nVidia Tesla T4 16GB

For more information and current load on nodes see https://metavo.metacentrum.cz/pbsmon2/resource/adan.grid.cesnet.cz.

How to access adan.grid.cesnet.cz

The cluster can be accessed via the conventional job submission through PBS Pro batch system (@meta-pbs server) in standard "gpu" (for jobs lasting up to 24 h) and "gpu_long" queue (for jobs lasting up to 336 h).

To limit your job strictly to Adan nodes within gpu queue, add to your qsub cl_adan=True.

Supported software

Related topics
TensorFlow
Theano

TensorFlow

TensorFlow is an open-source software library for numerical computation using data flow graphs.

You will need the CUDA license to use Tensorflow. Follow the instructions on TensorFlow application page.

Usage of TensorFlow with singularity container

@skirit.ics.muni.cz:~$ qsub -q adan -l select=ngpus=1 -I # start interactive job
# wait for the job to get ready

@adan2:~$ # the job is ready

@adan2$ singularity build --sandbox tensor docker://nvcr.io/nvidia/tensorflow:19.09-py3 # convert Docker NVidia container into writable (sandbox) directory
# this takes some time (Exploding layers...)
Singularity container built: tensor
Cleaning up...
@adan2:~$ ls
tensor

@adan2$ singularity build tensor.simg tensor # build singularity container tensor.simg from directory tensor
# this takes some time
@adan2:~$ ls
tensor tensor.simg

@adan2$ singularity shell --nv tensor.simg # invoke interactive shell within container tensor.simg
# the --nv option ensures that the Nvidia driver libs are located on the host system and then bind mounted into the container at runtime; this means you can run your container on a host with one version of the Nvidia driver, and then move the same container to another host with a different version of the Nvidia driver and both will work

# TensorFlow is run by importing it as python module
Singularity tensor.simg:/> python # run python
>>> import tensorflow as tf # import tensorflow
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session() # start session
# opening dynamic libraries...

# simple test example 1
>>> sess.run(hello)
b'Hello, TensorFlow!'

# simple test example 2
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> sess.run(a+b)
42

>>> exit() # exit python
Singularity tensor.simg:/> exit # exit singularity
exit
@adan2:~$

Usage of TensorFlow as module

qsub -q adan -l select=1:ncpus=1:ngpus=1:gpu_cap=cuda61:scratch_local=1gb -l walltime=00:01:00  tensorflow_test.sh # submit the job on adan
#!/bin/bash # This is the "tensorflow_test.sh" bash script

DATADIR="/storage/brno2/home/melounova/ADAN/TENSORFLOW" # CHANGE the path to your own directory !
echo "$PBS_JOBID is running on node `hostname -f` in a scratch directory $SCRATCHDIR" >> $DATADIR/jobs_info.txt # info about where the job was run
trap 'clean_scratch' TERM EXIT

module add tensorflow-2.0.0-gpu-python3

cd $SCRATCHDIR

# print "Hello world" (in Tensorflow 2.0 syntax) 
python -c "import tensorflow as tf; msg = tf.constant('TensorFlow 2.0 Hello World'); tf.print(msg)" >> results.out 2>&1

cp results.out $DATADIR


Theano

Theano is python library and compiler for manipulating mathematical expressions, especially matrix-based.

You will need the CUDA license to use Theano. Follow the instructions on Theano application page.

The simple bash script below tests the basic Theano functionality.

qsub -q adan -l select=1:ncpus=1:ngpus=1:gpu_cap=cuda61:scratch_local=1gb -l walltime=00:01:00  theano_test.sh # submit the job on adan
#!/bin/bash # This is the "theano_test.sh" bash script

DATADIR="/storage/brno2/home/melounova/ADAN/THEANO" # CHANGE the path to your own directory !
echo "$PBS_JOBID is running on node `hostname -f` in a scratch directory $SCRATCHDIR" >> $DATADIR/jobs_info.txt # info about where the job was run
trap 'clean_scratch' TERM EXIT

module add python36-modules-gcc # Theano is a part of python modules
export THEANO_FLAGS='device=cuda,floatX=float32'
module add cudnn-7.6.4-cuda10.1 # load cuda library

# import Theano and let it print out its version
python -c "import theano; import theano.gpuarray; print(theano.__version__)" >> results.out 

cp results.out $DATADIR  # copy the results back home