Beginners guide

From MetaCentrum
Jump to: navigation, search

(Česká verze)

Grid computing: basic idea

Related topics
Grids and supercomputers
Frontends
Scheduling system
Resources and queues

MetaCentrum offers resources for a so-called grid computing. Roughly speaking, a grid is a network of many interconnected computers, whose properties (type and size of disk, RAM, CPU, GPU etc.) may differ and which may be located in different places (cities, institutes).

The scheduling system keeps track of the grid's resources (memory, CPU time, disk space) and keeps the computational jobs waiting in queues until there is enough resources free for them to run. The users prepare and submit their jobs on so-called frontends, machines reserved to user activity. The rest of the grid's machines, computational nodes, does the computation itself.

The grid graphics.jpg

Log on a frontend machine

Related topics
Usage of PuTTY
Other ways to access the grid from Linux
Other ways to access the grid from Windows
Remote desktop: graphical access
Kerberos: single sign-on system
Kerberos on Linux
Kerberos on Windows

Accessing the grid means logging on to one of the frontends. All frontend machines run on OS Linux. Linux' users only need to open a terminal. Windows users will need to install PuTTY, which enables them to open a Linux terminal from a Windowx PC.

Open the terminal and type the following piece of code. Note: The text after "#" is a commentary only, don't copy it.

ssh jenicek@skirit.ics.muni.cz # user "jenicek" wants to log on skirit.ics.muni.cz frontend

You may want to substitute the frontend skirit.ics.muni.cz by any other. Check list of all available frontends.

On frontends there must not be done any resource-demanding operations, such as computing or large-scale compiling and archiving, as it affects negatively all other logged in users. It is necessary to submit such operations as an interactive job.

If you log in for the first time, you will be probably prompted by a query similar to the following:

The authenticity of host 'skirit.ics.muni.cz (2001:718:ff01:1:216:3eff:fe20:382)' can't be established. ECDSA key fingerprint is SHA256:Splg9bGTNCeVSLE0E4tB30pcLS80sWuv0ezHrH1p0xE. Are you sure you want to continue connecting (yes/no)?

Type "yes" and hit Enter. After that you will be prompted for a MetaCentrum password, type it and hit Enter. A MetaCentrum welcome logo and a bunch of information about your last login, home directories etc. will appear, with a line similar to the following right at the bottom.

jenicek@skirit:~$ # user "jenicek" is now logged on the frontend "skirit", the blinking cursor waits for the command to be typed

Categorized_list_of_topics There are more tools to access the grid. For those who want to use a single sign-on method of access there is Kerberos authentication system. For full list of articles on access see this part of Guidepost.

Know Linux command line

Related topics
[Linux commands]

All frontend machines run on OS Linux. This means some knowledge of the Linux CLI (command line interface) is needed. As there exist number of sites covering the Linux CLI basics, we will not give an overview here; a list of essential command can be found e.g. here. All code snippets and examples given in this guide will be commented and explained.

Prepare input file(s)

Related topics
Windows EOL problem

Suppose there exists a file called h2o.com in your PC, which you want to use as an input for the calculation done on grid. To transfer it to a frontend, open a terminal and type:

scp h2o.com jenicek@skirit.metacentrum.cz: # this command copies file from machine to machine

Windows users can use the same command via PuTTY or WinSCP tool.

OS Windows marks end-of-lines (EOLs) in a text file differently than OS Linux. This may not be a problem in input files, but a script (text file containing a series of commands) written on Windows PC will crash in Linux, producing following error message:

jenicek@skirit.metacentrum.cz: ./windows_script.sh # user jenicek wants to run a script "windows_script.sh" with Windows EOLs
-bash: ^M: command not found

To change the line endings to be compatible with Linux, Windows users need to feed their scripts first to a dos2unix command.

jenicek@skirit.metacentrum.cz: dos2unix windows_script.sh # convert script "windows_script.sh" from Windows EOLs to Linux EOLs

Note 1: Linux is case sensitive, so files "h2o.com", "H2o.com" and "H2O.com" are treated as three different files. The same is valid for commands. For example, command ls lists the content of a directory, whereas Ls or LS will return "command not found" error.

Note 2: When naming the files and directories, do not use any special characters like stars, semicolons, ampersands etc. Using a space (e.g."my file") is possible, but not recommended; the space is normally interpreted as a separator of commands and must be back-slashed (preceded by the "\" character), if you want to use it explicitly.

jenicek@skirit~: ls my file # this command will look for 2 files called "my" and "file", respectively, in the current directory 
jenicek@skirit~: ls my\ file # this command will look for a file called "my file" in the current directory 
jenicek@skirit~: ls "my\ file" # equivalent to the line above

Text editors on frontend

In order to write or modify a file, you can always copy it to your PC, make the changes in your favourite text editor then copy it back to frontend. This operation may become, however, rather time-consuming. It may be advantageous to learn how to open and modify a file directly on the frontend.

All frontends have two Linux text editors: vi and pico.

jenicek@skirit~: vi h2o.com # this command will open file "h2o.com" in vi text editor
jenicek@skirit~: pico h2o.com # this command will open file "h2o.com" in pico text editor

Both of these editors can seem at first a bit user-unfriendly, although they are immensely powerful and effective in text editing. If you are going to modify the files often, it is advisable to learn to use either of them. There can be found numerous resources on the Internet, e.g. [editor tutorial] and [commands quick reference].

Midnight commander

You can also view and manipulate files using Midnight commander, a visual file manager similar to Norton commander.

jenicek@skirit~: mc # open Midnight commander

Choose your applications

Related topics
List of applications (sorted alphabetically)
List of applications (sorted by topic)
Application modules
How to install an application

There is numerous scientific software installed on MetaCentrum machines, spanning from mathematical and statistical software through computational chemistry, bioinformatics to technical and material modelling software.

You can load an application offered by MetaCentrum to your job or machine via command module add + name of the selected application. If you are not sure which version of the application you would like to use, check complete list of applications page first.

For example:

jenicek@skirit:~$ module avail # shows all currently available applications
jenicek@skirit:~$ module avail 2>&1 | grep g16 # show all modules containing "g16" in their name
jenicek@skirit:~$ module add g16-B.01 # loads Gaussian 16, v. B.01 application
jenicek@skirit:~$ module list # shows currently loaded applications in your environment
jenicek@skirit:~$ module unload g16-B.01 # unloads Gaussian 16, v. B.01 application

Users can install their own software. If you would like to install a new application or new version of an application, try to read How to install an application or contact User support.

Request resources and specify scratch directory

Related topics
PBS options detailed guide
PBS reference card [PDF]
Fairshare

Running a calculation on grid is not as straightforward as running it on your local computer. You cannot, for example, do this:

jenicek@skirit:~$ g16 <h2o.inp # run Gaussian16 calculation with h2o.inp input file straightaway

or

jenicek@skirit:/cmake/my_code/build$ make # compile some VERY large code

Never do this! This is exactly what we mean by "running calculation on frontend", which is prohibited. The correct way is to send the calculation (commonly called "a job") via PBS (portable batch system) scheduling system to the computers dedicated solely for calculations. PBS keeps track of the computational resources across the whole grid and runs the job only after enough resources have been freed.

For PBS to make decision about the job at hand, the user must indicate how much resources (= number of processors, time and memory) his/her job will probably need and where the temporary files shall be located. This information is hidden within the qsub command options.

The very basic (minimal) syntax of qsub command options is as follows. The colons (:) and lowercase "L" (l) are divisors, the green text are pairs of <resource>=<value>.

qsub -l select=mem=4gb:scratch_local=1gb -l walltime=2:00:00

where

  • mem is the size of memory that will be reserved for the job (4 GB in this example, default 400 MB),
  • scratch_local=1gb specifies the size and type of disk space where temporary files will be stored (1 GB in this example, no default)
  • walltime is the maximum time the job will run, set in the format hh:mm:ss (2 hours in this example, default 24 hours)

The qsub command has a deal more options than the three shown here; for example, it is possible to specify a number of computational nodes and processors, or send email notification when the job terminates. A more in-depth information about PBS Pro commands (with examples) can be found in the page About scheduling system.

Types of scratches

Related topics
Types of scratch

Most application produce some temporary files during the calculation. Scratch is a directory where these files are placed. There is no default scratch, therefore the user always must choose one. There are three possible choices for a scratch:

scratch_local

  • locally on the node where calculation is done
  • temporary files are saved in /scratch/username/job_ID/ directory
  • reasonably fast, available on all machines
  • choose this option if you have no special requirements about scratch

scratch_shared

  • network volume available for all the nodes of a specific cluster
  • mounted to directory /scratch.shared which is shared between all clusters in a given location (city)
  • read/write operation slower than on local scratch
  • useful if you need to run more than one application that need access to the same data

scratch_ssd

  • with this option scratch will be located on one of SSD disks
  • SSD - type of disk with faster read/write operations than "normal" hard drive, but with smaller capacity
  • useful if your application creates/reads from a lot of files
  • number of nodes with SSD disks is limited - choosing this option may result in longer queuing time

After the job is finished, the scratch directory and all its content will be erased. Depending on how much free space there is on the computational node this may happen immediately or after a few days.

Batch or interactive job?

Related topics
How to get graphical interface for interactive jobs
Tutorial on using the trap command in batch script

Running a job can be done in two ways - either by creating a batch script or interactively. Both approaches do in essence the same thing, however depending on circumstances one may be preferable over the other.

MC2.png

Note: running an interactive job is not the same as running it straightaway on frontend. Here, too, the user first requests resources, waits until they are granted and only then can do what he/she likes.

Run interactive job

An interactive job is requested via qsub command and the -I (uppercase "i") option.

jenicek@elmo5-26~: qsub -I -l select=1:ncpus=2:mem=4gb:scratch_local=10gb -l walltime=1:00:00

The command in this example submits an interactive job to be run on machine with 2 processors, take up to 4 GB of RAM and last at most 1 hour. After hitting Enter, a line similar to the following will appear:

qsub: waiting for job 11681412.arien-pro.ics.muni.cz to start

After some time, 2 lines similar to following will appear.

qsub: job 11681412.arien-pro.ics.muni.cz ready # in this example, 11681412 is the ID of interactive job
jenicek@elmo5-26:~$ # note that user "jenicek" has been moved from a frontend (skirit) to a computational node (in this case elmo5-26)

Now, you can run calculation, compile, tar files on the command line, e.g.

jenicek@elmo5-26:~$ module load g16-B.01 # load Gaussian 16
jenicek@elmo5-26:~$ g16 <h2o.com >h2o.out   # run Gaussian calculation with "h2o.com" input file, output will be in "h2o.out" file

Unless you log out, after 1 hour you will get following message:

jenicek@elmo5-26:~$ =>> PBS: job killed: walltime 3630 exceeded limit 3600
logout

qsub: job 11681412.arien-pro.ics.muni.cz completed

This means the PBS scheduling system sent alert that some resource has run out (in this case time) and has therefore terminated the job.

Advantages of interactive jobs

  • Better overview what I'm doing, flexibility (good while compiling, archiving etc.)
  • Probably easier to learn (no need to write a batch script and risk making a mistake)
  • There is available graphical user interface for interactive jobs

Disadvantages of interactive jobs

  • Utilization of computational nodes is not optimal (waiting for user's input)
  • Not suitable to handle long-running jobs (closing a terminal or failed Internet connection will terminate the job!)
  • Difficult to handle jobs working with multiple files

Run batch jobs

In the case of batch job, all the information is packed in a script - a short piece of code. The script is then submitted via the qsub command and no further care needs to be taken. In batch jobs it is possible to specify the PBS options either on the command line, or to put then inside the script. Both ways are absolutely correct, choose what you personally prefer.

Specifying the PBS options inside the script is done via #PBS line prefix + an option. The batch script in the following example is called myJob.sh.


#!/bin/bash
#PBS -N myFirstJob
#PBS -l select=1:ncpus=4:mem=4gb:scratch_local=10gb
#PBS -l walltime=1:00:00 
#PBS -m ae
# The 4 lines above are options for scheduling system: job will run 1 hour at maximum, 1 machine with 4 processors + 4gb RAM memory + 10gb scratch memory are requested, email notification will be sent when the job aborts (a) or ends (e)

# define a DATADIR variable: directory where the input files are taken from and where output will be copied to
DATADIR=/storage/brno3-cerit/home/jenicek/test_directory # substitute username and path to to your real username and path

# append a line to a file "jobs_info.txt" containing the ID of the job, the hostname of node it is run on and the path to a scratch directory
# this information helps to find a scratch directory in case the job fails and you need to remove the scratch directory manually 
echo "$PBS_JOBID is running on node `hostname -f` in a scratch directory $SCRATCHDIR" >> $DATADIR/jobs_info.txt

#loads the Gaussian's application modules, version 03
module add g03

# test if scratch directory is set
# if scratch directory is not set, issue error message and exit
test -n "$SCRATCHDIR" || { echo >&2 "Variable SCRATCHDIR is not set!"; exit 1; }

# copy input file "h2o.com" to scratch directory
# if the copy operation fails, issue error message and exit
cp $DATADIR/h2o.com  $SCRATCHDIR || { echo >&2 "Error while copying input file(s)!"; exit 2; }

# move into scratch directory
cd $SCRATCHDIR 

# run Gaussian 03 with h2o.com as input and save the results into h2o.out file
# if the calculation ends with an error, issue error message an exit
g03 <h2o.com >h2o.out || { echo >&2 "Calculation ended up erroneously (with a code $?) !!"; exit 3; }

# move the output to user's DATADIR or exit in case of failure
cp h2o.out $DATADIR/ || { echo >&2 "Result file(s) copying failed (with a code $?) !!"; exit 4; }

# clean the SCRATCH directory
clean_scratch

You can then submit your job via qsub command.

jenicek@skirit:~$ qsub myJob.sh # submit a batch script named "myJob.sh"; the suffix is not necessarry, contratry to OS Windows filename convention
11733571.arien-pro.ics.muni.cz # job received ID 11733571 from a PBS server arien-pro.ics.muni.cz
jenicek@skirit:~$

In case you want to specify requested resources outside batch script, move the PBS options to the submitting command: qsub -l select=1:ncpus=4:mem=4gb:scratch_local=10gb -l walltime=1:00:00 myJob.sh in the same way as when running the job interactively. For full description of PBS options, consult section About scheduling system.

PBS servers

Because the capacity of any server is limited and the number of jobs can be very large, there are three PBS servers: arien-pro.ics.muni.cz, wagap-pro.cerit-sc.cz and pbs.elixir-czech.cz. The server pbs.elixir-czech.cz stands a bit apart, as it's machines are reserved for the Elixir group. Typically the user will come accross the first two, arien and wagap.

Each of the PBS servers "sees" a different and mutually exclusive set of computing machines. Similarly, every frontend is connected with one of the three PBS servers. As a consequence, it depends on the frontend from which the job was submitted by which PBS server the job will be managed and on which computational nodes the job will be run. In this sense the frontends are not equivalent.

The trinity frontend - PBS server - set of computing machines is summed in the table below. Note that the list of computing machines is not complete!

PBS server Frontend(s) Computing machines
arien-pro.ics.muni.cz skirit.ics.muni.cz
alfrid.meta.zcu.cz
tarkil.grid.cesnet.cz
nympha.zcu.cz
charon.metacentrum.cz
minos.zcu.cz
perian.ncbr.muni.cz
onyx.ncbr.muni.cz
lex.ncbr.muni.cz
zubat.ncbr.muni.cz
perian41-56.ncbr.muni.cz
aman.ics.muni.cz
...
wagap-pro.cerit-sc.cz zuphux.cerit-sc.cz ursa.cerit-sc.cz
urga.cerit-sc.cz
zefron.cerit-sc.cz
phi.cerit-sc.cz
...
All physical machines with cerit-sc.cz ending are managed by wagap, and vice versa
pbs.elixir-czech.cz elmo.elixir-czech.cz elmoXX.hw.elixir-czech.cz
All physical machines with elixir-czech.cz ending are managed by elixir, and vice versa

Tips for job submitting

  • The PBS servers share some queues and may pass jobs between themselves to optimize their load. However not all queues are shared, which may result in long waiting time if you happen to submit the job on a PBS server whose computing machines are currently overloaded. Especially for more resource-hungry jobs it may be a good idea to check the current load of physical machines and choose the less busy PBS server (= submit the job from a frontend associated with the less busy PBS server)
  • The PBS servers have separate fairshare, meaning a particular PBS server does not take into account how much resources your jobs already consumed on the remaining PBS servers. If your jobs are getting de-prioritized because they have already consumed lots of resources, choosing a different PBS server may cut down the waiting time

Track your job

Related topics
[List of your jobs]

Qsub command returns jobID which you can use to track or delete your job (e.g. 1733571). If you are logged on a frontend managed by the same PBS server as the one which tracks the job, the number will suffice to identify the job. In other cases, you have to use full job ID = number + the name of the PBS server (e.g. 1733571.arien-pro.ics.muni.cz).

You can track jobs via online application PBSmon:

It is also possible to track your job on CLI via its ID (jobID). This is done by a command qstat. For example:

qstat -u jenicek # list all jobs of user "jenicek" running or queuing on the current PBS server
qstat -u jenicek @arien-pro.ics.muni.cz @wagap-pro.cerit-sc.cz @pbs.elixir-czech.cz # list all running or queuing jobs of user "jenicek" on all PBS servers
qstat -xu jenicek # list finished jobs for user "jenicek" 
qstat -f <jobID> # list details of the running or queueing job with a given jobID
qstat -xf <jobID> # list details of the finished job with a given jobID

After submitting a job and checking its status, you will see typically something like the following.

jenicek@skirit~: qstat -u jenicek # show the status of all running or queing jobs submitted by user "jenicek"
arien-pro.ics.muni.cz: 
                                                                 Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
11733550.arien-pro.* jenicek q_2h     myJob.sh         --    1   1    1gb 00:05 Q   --

The letter under the header 'S' (status) gives the status of the job. The most common states are:

  • Q – queued
  • R – running
  • F – finished

Apart from these, quite often you can see on the PBSmon job list jobs with status denoted "M" (moved). This means the job has been moved from one PSB Pro server to another.

Tracking running jobs

Follow these steps if you would like to check outputs of a job, which has not finished yet:

1. Find what machine is your job running: http://metavo.metacentrum.cz/pbsmon2/person -> "Show my jobs". You will see a page similar to the following: Job pbsmon 1.png

A click on the job's ID will open a page with full information about a job, including the hostname (= machine where the job is running on) and a path to the scratch directory. Job pbsmon 2.png

2. Login to the machine from any frontend ssh command. E.g.

ssh zapat112.cerit-sc.cz

3. Navigate to the /var/spool/pbs/spool/ directory and examine the files:

  • $PBS_JOBID.OU for standard output (stdout – e.g. “1234.arien-pro.ics.muni.cz.OU”)
  • $PBS_JOBID.ER for standard error output (stderr – e.g. “1234.arien-pro.ics.muni.cz.ER”)

To watch file a continuously, you can also use a command tail -f.

jenicek@zapat112.cerit-sc.cz:/var/spool/pbs/spool$ tail -f 1234.arien-pro.ics.muni.cz.OU # this command outputs appended data as the file grows

Get e-mail notification about a job

To keep track of submitted jobs, some users prefer to get notification via e-mail. To do this, add the following line to your batch script

#PBS -M your.name@your.email.com # specify recipient's email; if this option is not used, your registered email will be used
#PBS -m abe # send mail when the job begins (b), aborts (a) or ends (e)

or

#PBS -m ae # send mail when aborts (a) or ends (e)

Caution: different e-mail providers apply various filters to protect their users from spam. If you submit a lot of jobs with the PBS -m abe option, they may end up in your spam folder.

Examine job's standard output and standard error output

When a job is completed (no matter how), two files are created in the directory from which you have submitted the job. One represents standard output and the other one standard error output:

 <job_name>.o<jobID> # contains job's output data
 <job_name>.e<jobID> # contains job's standard error output

The standard error output contains all the error messages which occurred during the calculation. It is a first place where to look if the job has failed. The messages collected in standard error output are valuable source of information about why the job has failed. In case you contact user support to ask for help, do not remove the error file, but send it as an attachment together with your request.

You can copy these files to your personal computer (scp command) for further processing. You can also examine them directly on CLI by any of the following commands.

jenicek@skirit.cz:~$ cat myjob.sh.o1234 # print whole content of file "myjob.sh.o1234" on standard output
jenicek@skirit.cz:~$ cat myjob.sh.o1234 | more # print whole content of file "myjob.sh.o1234" on standard output screenful-by-screenful (press spacebar to go to another screen)
jenicek@skirit.cz:~$ vi myjob.sh.o1234 # open file "myjob.sh.o1234" in text editor vi 
jenicek@skirit.cz:~$ less myjob.sh.o1234 # open file "myjob.sh.o1234" read only

Job termination

Often it is the case the user submits a job and only then realizes something was wrong or missing in the input. There is a way to dump waiting or running job, the qdel command.

jenicek@skirit~: qdel 21732596.pbs.elixir-czech.cz # delete the job with full job ID "21732596.pbs.elixir-czech.cz"

Log off

Logging off is simple.

jenicek@skirit:~$ exit
logout
Connection to skirit.metacentrum.cz closed.

Logging off will terminate any currently running interactive jobs. The batch jobs are independent on whether the user is logged on/off and will not be affected.