FAQ/Application

From MetaCentrum
< FAQ
Jump to: navigation, search

I can not run queen properly at testing example (see /home/pavelm/queen/test)

In case I run queen locally in directory /home/pavelm/queen/test (by command "queen --Iuni example noe") the job runs properly, in case I want to queue the job, it fails with error message: "ERROR - Dictionary /home/pavelm/queen/test/queen.conf could not be read. Script has been stopped." I tryied to run this script by psubmit and also by qsub in normal queue ("qsub -q normal run" nebo "psubmit normal run"). Script contains both "metamodule add queen" and setting of path to configuration file into environment variable QUEEN_CONF ("QUEEN_CONF=/home/pavelm/queen/test/queen.conf;export QUEEN_CONF"). Script to execute is following:

metamodule add queen
Q_PROJECT=/home/pavelm/queen/test/
export Q_PROJECT
QUEEN_CONF=/home/pavelm/queen/test/queen.conf
export QUEEN_CONF
#export Q_PROJECT=/home/pavelm/queen/test/
#export QUEEN_CONF=/home/pavelm/queen/test/queen.conf
queen --Iuni example noe

Unfortunatelly I can not learn why my configuration file can not be read

Path which is set into QUEEN_CONF must be available from machine, where PBS ran your job (in case of parallel jobs from all nodes), so path /home/pavelm/... is local for clusters (leads on different clusters to different disks) Either put queen.conf into your directory in AFS(/afs/ics.muni.cz/home/pavelm, or /home/pavelm/shared) this directory is seen from all MetaCentrum machines, or limit running of job just to machines, which have access to your directory. E.g: by option PBS -l brno (-l ...:brno) if you have it in /home on skirit or perian.

...For program efective running you have to put subdirectory "log" in assigment (item Q_LOG in queen.conf) into the fastest storage -- e.g. on a scratch volume (in a directory dedicated for a job -- identified by the $SCRATCHDIR environment variable) or in a /storage volume.

I linked through Q_LOG on $SCRATCHDIR ... Can I apply above described use of $SCRATCHDIR to parallel job in program xplor-nih.

According documentation (/afs/ics.muni.cz/software/xplor-nih-2.20/parallel.txt) you have to have a directory shared by all used nodes using program xplor in parallel mode. The best way should be to use the shared scratch volume (available just at the mandos cluster -- the dedicated space is identified by the $SCRATCHDIR environment variable), or to use a NFSv4 volume within the /storage directory. By the way, QUEEN must be able to read assingment from all nodes (e.g. it could be copied for just for reading), but subdirectory log could and it is sufficient to be on each node local, because some parts of a job run in it as separely runned xplor instance which communicate with queen local processes through files. Result of whole job is written just by main process.

Gaussian

Memory alocation in program Gaussian version G03.E01 does not work. What should I do?

In case you get warning: "buffer allocation failed in ntrext1." just after running the program, use a directory on a scratch volume (the directory is identified by the $SCRATCHDIR environment variable) as a working directory instead of the /home (there is an error in the new version, that occurs according type of file system)

I think there is a problem with the node orca14-2. Gaussian computings have fallen down immediately after running with error message (every time in different part of the log file) ´´Erroneous write. Write 8192 instead of 12288.´´

Reason of this failure is that your disk is full of undeleted Gaussian helping files. These files are remains from your previous computings which weren't successful. You can avoid this situation by deleting all helping files after the end of computing. The easiest way is to create a own directory for every single computing and to remove this directory in the end of computing.

GAUSS_SCRDIR=$SCRATCHDIR
GAUSS_ARCHDIR=$GAUSS_SCRDIR
mkdir $GAUSS_SCRDIR
...
g03 ...
rm -rf $GAUSS_SCRDIR