ScaleMP (en)

Z MetaCentrum
Přejít na: navigace, hledání

(Česká verze)

ScaleMP is a software layer (hypervisor), which connects multiple ordinary x86 servers into one SMP machine. The goal of this aggregation is to connect resources (CPU, RAM) from many servers into one virtual machine. This method is practically opposite to standard virtualization where one server runs multiple virtual machines. Company ScaleMP defines this architecture as versatile SMP or vSMP a its implementation as vSMP Foundation, widely known as ScaleMP.

Who is ScaleMP good for

Advantages of ScaleMP

  • Your job has all aggregated memory and CPUs available - you don't have to use manual parallelization
  • The system can offer significant savings in SW licences which are bound to to psychical nodes (although legal interpretation could be discutable)

Disadvantages of ScaleMP

  • Generally, an application using ScaleMP has slower computation speed than the application with manual parallelization
  • You have to pay some effort to prepare your job, if you would like to achieve its solid computation speed

Generally, the system is particularly suitable for self-running jobs that require a large number of processors and/or memory and also jobs which parallelization is difficult and you have to pay some nontrivial effort for its preparation. The system is not ideal for running multiple smaller jobs.

What is the machine made of

Machine alfrid is made of 16 servers (IBM x3550 M4) connected with 2xFDR infiniband (SX6036), which have 256 cores and 4TB memory in sum. As scratch storage is used disk array IBM V3700 (24x 10k 600 gb disks, connected with SAS into masternode). Cluster has 2x10gb/s uplink (from masternode) into MetaCentrum network located at ZČU. The system contains also the 17th server, which is used as license and image server of ScaleMP a also contains local Ethernet switch (SG500-52) for IPMI aggregation and network booting.

How to access the machine

You have to be member of MetaCentrum (see Application). Users from groups iti or kky have automatic access; for individual access contact HW owners via email to meta@cesnet.cz and also send copy to honzas@ntis.zcu.cz. If you are affiliated to science centre NTIS, ask for membership representatives of noticed groups (KKY - Jan Švec|Flídr Miroslav, KMA - Jan Nejedlý, KFY - Jiří Houška, KIV - Ladislav Pešička, KME - Miroslav Horák), who can add new users to groups (tutorial is here).

You can check your membership in a /etc/group file on any MetaCentrum machines or see MetaCentrum webpage My account -> Personal data, in the VO settings tab select VO MetaCentrum and in the down right section of the page should appear list of groups.

How to run a job on the machine

First of all, you should read How to compute topic, if you are new in MetaCetrum.

Jobs use the machine alfrid via special queue scalemp and you have to specify concrete group, which you use for access. A job submit might look like:

qsub -q scalemp -W group_list=iti .....

In case you need the whole machine, request all CPUs (-l nodes=1:ppn=256) or request the machine exclusively (-l nodes=1#excl). You can also use qsub option -I for interactive job.

ZarovkaMala.png Note: You can login to the machine afrid via ssh, but it is necessary to use the scheduling system for computing

How to take full advantage of the machine in a job

Applications using ScaleMP are struggling with speed of the computation and because of that, it's important to focus on location of processed data and try to do optimization steps.

There is available ScaleMP documentation on the ScaleMP webpage (after registration), where you can find e.g. Best practices or examples for successful running the most used SW (Ansys, Matlab, Fluent, VASP ...)

Other suggests, examples and tutorial focusing on optimization own applications are available in form of examples on the machine alfrid:/opt/ScaleMP/examples

We found these steps as effective:

  1. read at least basic ScaleMP documentation
  2. have a look, if there is not a detailed tutorial for your SW
  3. prepare a basic job, which will run about 1 hour
  4. try to run the job on an empty machine with suggests from this topic
  5. evaluate the achieved job speed acceleration against standard HW
  6. if you are not satisfied, try to contact our user support (meta@cesnet.cz) or ScaleMP support (Support@ScaleMP.com)


How difficult is to edit an application - model example: the basic edit

First optimization attempt can be done even before running an application - add the following lines to your startup (batch) script:

....
# find $NCPU CPUs, which are the most suitable
CPULIST=`/opt/ScaleMP/numabind/bin/numabind --offset $NCPU --flags=best
2>/dev/null`

# connect this process and his children with these concrete CPUs
taskset -pc $CPULIST $$

# run the computation
....

It is probably possible to achieve additional enhancement with help of other optimization settings or libraries offered by company ScaleMP, see links to documentation higher.

How difficult is to edit an application - model example: multiply two matrices in MATLAB

A simple MATLAB script, which multiply two random matrices:

nproc=str2num(getenv('NPROC'));
maxNumCompThreads(nproc); % Enable multithreading
setenv('MKL_NUM_THREADS', getenv('NPROC'));
% Set parameters
numRuns = 1;
% Number of runs to average over
% dataSize = 100000; % Data size to test
dataSize = str2num(getenv('DATASIZE'));
x=rand(dataSize,dataSize); % Random square matrix
% Matrix multiplication
for i = 1:numRuns
y=x * x;
end
quit

For ScaleMP is specific to fix processes to concrete processors before running a computation:

#! /bin/bash
# Make MATLAB accessible
export PATH=$PATH:/software/matlab-8.5/bin
#
# Get arguments
export NPROC=16
[ "_$1" != "_" ] && NPROC=$1
export DATASIZE=10000
[ "_$2" != "_" ] && DATASIZE=$2
#
# ScaleMP specific stuff
#
# Intel OpenMP and MKL environment variables
export KMP_AFFINITY=compact,granularity=fine
export KMP_LIBRARY=turnaround
export MKL_DYNAMIC=FALSE
export MKL_VSMP=1
#
# Find which CPUs to run MATLAB on
CPULIST=`/opt/ScaleMP/numabind/bin/numabind --offset $NPROC --flags=best 2>/dev/null`
export NUMBIND_RESERVED=$CPULIST
taskset -pc $CPULIST $$
#
# Run MATLAB
echo "Running with $NPROC threads on $DATASIZE x $DATASIZE matrices"
matlab -nodisplay -nojvm -r matmul 2>&1