How to compute/Requesting resources

Z MetaCentrum
Přejít na: navigace, hledání
Related topics
PBS Professional
Scheduling system
Resources request examples

Since January 2017, we had switched to a new scheduling system - PBS Professional. Old TORQUE environment is no longer accessible. How to request resources in a new syntax of PBS Pro is described in details on this page.

General syntax of PBS Pro command:


qsub -l select=1:ncpus=1:mem=1gb -l walltime=1:00:00 -l option1 -l option2 ... script.sh


Basic options

Information.png Please note: Only one select argument is allowed at a time.

  • maximal duration of a job – is set by -l walltime=[[hh:]mm:]ss, default walltime is 24:00:00. Queues q_* (such as q_2h, q_2d etc.) are not accessible for submit jobs, rout queue (default) automatically chose appropriate time queue based on specified walltime. Examples:
    • -l walltime=1:00:00 (one hour)
    • -l walltime=24:00:00 (one day)
    • -l walltime=120:00:00 (5 days)
  • number of machines and processors – number of processors and "chunks" is set with -l select=[number]:ncpus=[number], terminology of PBS Pro defines "chunk" as further indivisible set of resources allocated to a job on 1 physical node, job with more chunks is analogy of job with more nodes in old TORQUE system. Chunks can be on one machine next to each other or conversely always on different machines, eventually they can be placed according to available resources. Examples:
    • -l select=1:ncpus=2 – two processors on one chunk
    • -l select=2:ncpus=1 – two chunks each with one processor
    • -l select=1:ncpus=1+1:ncpus=2 – two chunks, one with one processor and second with two processors
    • -l select=2:ncpus=1 -l place=pack – all chunks must be on one node (if there is not any big enough node, the job will never run)
    • -l select=2:ncpus=1 -l place=scatter – each chunk will be placed on different node (default for old TORQUE system)
    • -l select=2:ncpus=1 -l place=free – permission to plane chunks on nodes arbitrarily, according to actual resource availability on nodes (chunks can be on one or more nodes, default behavior for PBS Pro):
    • if you are unsure about the number of needed processors, ask for an exclusive reservation of the whole machine using the parameter "-l place=":
    • -l select=2:ncpus=1 -l place=exclhost – request for 2 exclusive nodes (without cpu and mem limit control)
    • -l select=3:ncpus=1 -l place=scatter:excl – it is possible to combine exclusivity with specification of chunk planning
    • -l select=102:place=group=cluster – 102 cpus on one cluster
  • amount of temporary scratch – fast and reliable, required for job processing, always specify type and size of scratch, job has no default scratch assigned, scratch type can be one of scratch_local|scratch_ssd|scratch_shared. Example:
    • -l select=1:ncpus=1:mem=4gb:scratch_local=10gb
    • -l select=1:ncpus=1:mem=4gb:scratch_ssd=1gb
    • -l select=1:ncpus=1:mem=4gb:scratch_shared=1gb
  • after the request for scratch, following variables are present in work environment:
$SCRATCH_VOLUME=<dedicated capacity>
$SCRATCHDIR=<directory>
$SCRATCH_TYPE=<scratch_local|scratch_ssd|scratch_shared>
  • amount of needed memory – job is implicitly assigned with 400MB of memory if not specified otherwise. Examples:
    • -l select=1:ncpus=1:mem=1gb
    • -l select=1:ncpus=1:mem=10gb
    • -l select=1:ncpus=1:mem=200mb
  • licence – is set by parameter -l
    • -l select=3:ncpus=1 -l walltime=1:00:00 -l matlab=1 – one licence for Matlab
  • sending the information emails about the job state For example:
    • -m abe – sends an email when the job aborts (a), begins (b) and completes/ends (e)

You can use the tool Command qsub refining for condition definition.

To submit job on special nodes with the concrete OS

  • To submit job on machine with Debian9, please use "os=debian9" in job specification:
 zuphux$ qsub -l select=1:ncpus=2:mem=1gb:scratch_local=1gb:os=debian9 …
  • To run tasks on a machine with any OS, type "os = ^ any"
 zuphux$ qsub -l select=1:ncpus=2:mem=1gb:scratch_local=1gb:os=^any …
  • If you experience any problem with libraries or applications compatibility in Debian9, please add module debian8-compat.

Advanced options

Resources of computational nodes

Provided list of attributes may not be complete. You can find actual list on the web in section Node properties

  • node with specific featurevalue of a feature must be always specified (either True or False). Examples:
    • -l select=1:ncpus=1:cluster=tarkil – request for a node from cluster tarkil
    • -l select=1:ncpus=1:cluster=^tarkil – request for a node except cluster tarkil
  • request for specific node – always use shortened name. Example:
    • -l select=1:ncpus=1:vnode=tarkil3 – request for node tarkil3.metacentrum.cz
  • request for a host – use full host name
    • -l select=1:ncpus=1:host=tarkil3.grid.cesnet.cz
  • cgroups – request limiting memory usage by using cgroups, limiting memory by cgroups aren't enabled on all machine. Example:
    • -l select=1:ncpus=1:mem=5gb:cgroups=memory
  • cgroups – request limiting CPU usage by using cgroups, limiting CPU by cgroups aren't enabled on all machine. Example:
    • -l select=1:ncpus=1:mem=5gb:cgroups=cpuacct
  • networking cards – "-l place" is also used for infiniband request:
    • -l select=3:ncpus=1 -l walltime=1:00:00 -l place=group=infiniband
    • -l select=3:ncpus=1:infiniband=brno – specific value of infiniband must be requested inside the chunk
  • CPU flags – limit submission on nodes with specific CPU flags
    • -l select=cpu_flag=sse3
    • list of available flags can be obtained by command pbsnodes -a | grep resources_available.cpu_flag | awk '{print $3}' | tr ',' '\n' | sort | uniq – this list is updated with every addition of some nodes or their removal. It is thus wise to check the available flags before you need anything special.

Moving job to another queue

qmove uv@wagap-pro.cerit-sc.cz 475337.wagap-pro.cerit-sc.cz


GPU computing

  • For computing on GPU a gpu queue is used (specified can be either gpu or gpu_long). GPU queues are accessible for all MetaCentrum members, one gpu card is assigned by default. IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES variable.
    • -l select=ncpus=1:ngpus=2 -q gpu

Job Array

  • The job array is submitted as:
 # general command
 $ qsub -J X-Y[:Z] script.sh
 # example
 $ qsub -J 2-7:2 script.sh
  • X is first index of a job, Y is upper border of an index and Z in optional parameter of an index step, therefore the example command will generate subjobs with indexes 2,4,6.
  • The job array is represented by a single job whose job number is followed by "[]", this main job provides an overview of unfinished sub jobs.
$ qstat -f 969390'[]' -x | grep array_state_count
    array_state_count = Queued:0 Running:0 Exiting:0 Expired:0 
  • An example of sub job ID is 969390[1].arien-pro.ics.muni.cz.
  • The sub job can be queried by a qstat command (qstat -t).
  • PBS Pro uses PBS_ARRAY_INDEX instead of Torque's PBS_ARRAYID inside of a sub job. The varibale PBS_ARRAY_ID contains job ID of the main job.

MPI processes

  • How many MPI processes would run on one chunk is specified by mpiprocs=[number].
  • For each MPI process there is one line in nodefile $PBS_NODEFILE that specifies allocated vnode.
    • -l select=3:ncpus=2:mpiprocs=2 – 6 MPI processes (nodefile contains 6 lines with names of vnodes), 2 MPI processes always share 1 vnode with 2 CPU
  • How many OpenMP threads would run in 1 chunk (ompthreads=[number]), 2 omp threads on 1 chunks is default behaviour (ompthreads = ncpus)

Example of a script defining workflow of a job

Related topics
Working with data in a job

Following script describes steps executed inside the job including data handling.

 #!/bin/bash
 #PBS -l select=1:ncpus=2:mem=1gb:scratch_local=4gb
 #PBS -l walltime=04:00:00
 # modify/delete the above given guidelines according to your job's needs
 # Please note that only one select= argument is allowed at a time.

 # cleaning of SCRATCH when error or job termination occur
 trap 'clean_scratch' TERM EXIT

 DATADIR="/storage/praha1/home/$PBS_O_LOGNAME/"

 # copy own data into scratch directory
 cp $DATADIR/input.txt $SCRATCHDIR || exit 1
 cd $SCRATCHDIR || exit 2

 # add application modules necessary for computing , for example.:
 module add jdk-8

 # respective execution of the computing
 java -jar input.jar

 # copy resources from scratch directory back on disk field, if not successful, scratch is not deleted
 cp output.txt $DATADIR || export CLEAN_SCRATCH=false

Script is entered into PBS Pro by command:

qsub script.sh

This example ensures cleaning of the scratch after the job terminates (command clean_scratch). If copying of file fails, scratch directory will not be erased and files can be picked manually. Variable $PBS_O_LOGNAME is used to store the name of current user. Another useful variable is $PBS_O_WORKDIR where path to directory where qsub was executed is stored.

qstat

  • Completed job is in the stage "F" (finished).
  • Command qstat views only waiting and running jobs. Displaying finished jobs is done by calling qstat -x.
  • For smaller groups of jobs, PBS Pro can display expected start (Est Start Time). This can be done with command qstat -T.
qstat -u <login> lists all user running or queueing jobs
qstat -x -u <login> list finished user jobs 
qstat -f <jobID> list details of the running or queueing job
qstat -x -f <jobID> list details of the finished job

Setting a Job’s Priority

qsub -p

The “-p priority” option defines the priority of the job. The priority argument must be an integer between -1024 (lowest priority) and +1023 (highest priority) inclusive. The default is no priority which is equivalent to a priority of zero.

More details Page 83, http://www.pbsworks.com/documentation/support/PBSProUserGuide10.4.pdf