PBS Professional

Z MetaCentrum
Přejít na: navigace, hledání

(Česká verze)

Related topics
Official documentation
How to compute
PBS Pro Quick Start [PDF]

This page describes the reasons for using scheduling system PBS Professional and its differences from old TORQUE system. How to request resources in the new PBS Pro environment can be found on the page Requesting resources.


Why are we switching to PBS Professional?

The main scheduling system being used in MetaCentrum till spring 2017 was significantly modified and extended TORQUE Resource Manager version 2.4. However, during its 6 operating years our version (2.4) has greatly fallen behind the major develop 6.0.2 version from Adaptive Computing company. We have simultaneously run into principal problems of current production version that decreased throughput and scalability of the system.

With regards to future development and expected increase of MetaCentrum computational capacity, modernization of our scheduling system was required. In summer 2016 source codes from concurrent system PBS Professional that we used before became available. Consequent analysis showed that this scheduling system fulfills almost every of our demands – supports scalability and offers compatibility with others PBS Pro systems. It also offers many so far unsupported functionalities. For these reasons, we for now abandoned the time-demanding transition to current version of TORQUE. According to our opinion switching to open system PBS Professional is easier and more interesting choice.

From both user and administrator point of view PBS Pro promises better performance as well as interesting news and so far unsupported functionalities. Mentioned can be:

  • support of Docker containers
  • high throughput and scalability (50,000 nodes, ~1,000,000 threads, >1,000,000 jobs/day)
  • detailed specification of allocation of resources to jobs by parameter "-l select=..." and description of their allocation to nodes by parameter "-l place=..."

PBS Pro offers highly advanced opportunities for specification of required resources. It this therefore possible to specify required number of resources of parallel job for individual "chunks" (chunk is further indivisible set of resources allocated to job on 1 physical node). At the same time it is also possible to affect the way how these chunks will be planned on physical nodes. By this, chunks can be either on one machine "next to each other" or conversely always on different machines. Eventually they will be set according to actual resource availability.

Based on above mentioned reasons we decided to experimentally test this scheduling system in the new MetaCentrum cluster tarkil(1-16).grid.cesnet.cz. As PBS Pro fulfilled our expectations, at the end of spring 2017 all MetaCentrum machines were turned into PBS Pro. CERIT-SC is also slowly starting to use PBS Pro, but most of their machines are still in TORQUE schedulling system.

Main differences between PBS Professional and old TORQUE system

  • The syntax of job submitting differs in PBS Pro. Using "-l nodes=..." is considered out-dated and was therefore disabled. Please, use the new syntax "select" which is described on page Requesting resources.
  • Always specify desired walltime. Walltime in PBS Pro must be in this format [[hh:]mm:]ss.
  • Always specify size and type of scratch.
  • In "select" syntax, required resources are divided into "chunks" that can be planned to one or more nodes.

Official documentation

Detailed documentation is available on pages of Altair Engineering, Inc.: PBS Professional documentation

Most important from users' point of view is the user manual: PBS Professional User's Guide

PBS Professional Quick Start Guide

Quickstart-pbspro-small.pdf

Known problems

  • missing libraries for MPI in less used application SW, please ask us for help

Any problems please report to meta@cesnet.cz.