Running the jobs

From MetaCentrum
Jump to: navigation, search

Access to compute nodes is provided via a scheduler and an associated batch queuing application. To run an application, the user submits a 'job' to the scheduler. The scheduler will then determine, based on fairshare, available and requested resources and other jobs waiting in the queue, when the job should run. At that time, the scheduler hands the job to the queuing application which initiates the job.


Fairshare

The MetaCentrum batch systems uses a fairshare scheduling policy. Fairshare is a mechanism which allows historical resource utilization information to be incorporated into job feasibility and priority decisions.

  • This policy tries to distribute the resources in a 'fair' way between groups and persons that are using the system. Fairshare policy adjusts dynamically job priority in a queue. Therefore, jobs are not executed in the same order as they were submitted.
  • When the job belongs to a user that has used a lot of system resources (CPU, RAM, GPU and scratch disk) in past few days, the priority of the job is decreased. On the other hand, if he or she used little resources, the priority is set high and the job is placed closer to the top of the queue.
  • System tracks usage of CPU, RAM, GPU and scratch disk. Each such resource consumption is then normalized to its CPU-equivalent using the formula:
Usage_Resource = (Resource_used/Resource_total) * CPU_total * Resource_Weight
  • Normalized resource consumptions are summed and added to the user's Fairshare usage (total resource consumption):
Fairshare_usage += Usage_CPU + Usage_RAM + Usage_Scratch + Usage_GPU
  • The time-scope of the data stored with respect to the past usage of the system is limited. Simply put, the importance of data decreases with time: yesterday is more important than the day before yesterday and so on. The memory lasts typically 30 days. The more the user has used in recent days, the lower his or her priority is.
  • Importantly, the user is prioritized for publications with acknowledgment to MetaCentrum/CERIT-SC. User with higher number of publications (reported in our system) is prioritized over users with smaller numbers of publications.

Job Lifecycle

PBS JOB scheduler life cycle-CUT.jpg

The life cycle of a job can be divided into four stages: creation, submission, execution, and finalization.

  • Creation - Typically, a submit script is written to hold all of the parameters of a job. These parameters could include how long a job should run (walltime), what resources are necessary to run, and what to execute.
  • Submission - A job is submitted with the qsub command. Submission script is a shell script that describes the processing to carry out (e.g. the application, its input and output, etc.) and requests computer resources (number of cpus, amount of memory) to use for processing. Directives are job specific requirements given to the job scheduler. The most important directives are those that request resources. The most common are the walltime limit (the maximum time the job is allowed to run) and the number of CPU, MEM, and SCRATCH required to run the job.
  • Execution - Job is running.
  • Finalization - When a job completes any left over user processes are killed, and by default, the stdout and stderr files are copied to the directory where the job was submitted.

Backfilling

Backfill is a scheduling optimization which allows a scheduler to make better use of available resources by running jobs out of order (it allows to run short jobs in gaps).