How to compute/Job management

From MetaCentrum
Jump to: navigation, search

(Česká verze)

Related topics
A job tracking video tutorial


Tracking job status

You can track your jobs via online application PBSmon:


It is also possible to track your job via its ID (jobID) and terminal. For example:

qstat -u <login> lists all user running or queueing jobs on actual PBS server
qstat -u <login> @arien-pro.ics.muni.cz @wagap-pro.cerit-sc.cz @pbs.elixir-czech.cz list all running or queueing jobs on all PBS servers
qstat -x -u <login> list finished user jobs 
qstat -f <jobID> list details of the running or queueing job
qstat -x -f <jobID> list details of the finished job

Tracking running jobs

Follow these steps if you would like to check outputs of a job, which has not finished yet:

1. Find what machine is your job running on -> use e.g. PBSmon (https://metavo.metacentrum.cz/pbsmon2/person)

2. Login to the machine from any frontend with ssh target_machine command. E.g.:

ssh zapat112.cerit-sc.cz

3. Navigate to the /var/spool/pbs/spool/ directory and examine the files:

  • $PBS_JOBID.OU for standard output (stdout – e.g., “1234.arien-pro.ics.muni.cz.OU”)
  • $PBS_JOBID.ER for standard error output (stderr – e.g., “1234.arien-pro.ics.muni.cz.ER”)

Examine job's standard output and standard error output

ZarovkaMala.png Note: You will be notified by email that your job is done, if you used -m abe qsub option

When a job is completed (no matter how), two files are created in the directory from which you have submitted the job. One represents standard output and the other one standard error output:

 <job_name>.o<jobID> # contains job's output data
 <job_name>.e<jobID> # contains job's standard error output

You can examine these files to find output information like results, errors etc. :

 cat ./myjob.sh.o12345   # shows job's standard output
 cat ./myjob.sh.e12345   # shows job's standard error output

Job termination

Forced job termination is possible via terminal, qdel commnad and jobID. For example:

qdel 12345 # terminates the job with ID 12345