Matlab
Description
MATLAB is an integrated system covering tools for symbolic and numeric computations, analyses and data visualizations, modeling and simulations of real processes etc. Next to MATLAB and Simulink there are other supplementary toolboxes available (see also section Licenses below).
Notice: Alternatively, you can use a Kubernetes graphical version (VNC needed) https://docs.cerit.io/docs/matlab.html . Available for testing, please report any problem with Kubernetes directly to k8s@ics.muni.cz
or Matlab via Jupyter Notebook in CERIT-SC https://hub.cloud.e-infra.cz/hub/ (available image with Matlab version 2022b). Documentation at http://docs.cerit.io/docs/jupyterhub.html or Matlab Desktop via OnDemand https://ondemand.grid.cesnet.cz/pun/sys/dashboard/apps/index . Documentation at https://wiki.metacentrum.cz/wiki/OnDemand |
Licences
There exists a permanent licence type "College" (for operating systems UNIX and MS Windows) that is available to all the national grid infrastructure MetaCentrum users as well as to all students and employees of
- Masaryk university in Brno
- University of West Bohemia in Pilsen
- Czech Technical University in Prague
MATLAB can be installed freely on every computer at these universities and once running it takes licences from a pool of available licences. These licences are currently maintained by three licence servers: at ZČU in Plzeň, at ÚVT UK in Praha and at ÚVT MU in Brno.
Names and number of licences:
|
|
|
Together with the permanent licence a complete maintenance (including new version updates of all mentioned products) is also available and it is annually renewed.
A list of licenses and its usage can be obtaineda by
$ /software/matlab-9.8/etc/lmstat -a | grep "in use"
Licences and scheduler
You need to tell the PBS scheduler that the job will require a licence. Each Matlab package has its own licence. For exasmple, if you need to use Statistics_Toolbox, submit your job like this:
qsub ... -l matlab=1 -l matlab_Statistics_Toolbox=1 ...
Names of toolboxes with prefix matlab_ are required by PBS scheduling system for the purpose of license reservation. During MATLAB usage, don't use these prefixes and use only base name of selected toolbox.
Usage
Upcoming modulesystem change alert!
Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.
You can test the new version now by adding a line
source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules
to your script before loading a module. Then, you can list all versions of matlab and load default version of matlab as
module avail matlab/ # list available modules module load matlab # load (default) module
If you wish to keep up to the current system, it is still possible. Simply list all modules by
module avail matlab
and choose explicit version you want to use.
Matlab can be accessed as
- standard SW module:
matlab-9.11
- MATLAB version 9.11 (2021b)- current default version (module
matlab
ormatlab-9.10
) is MATLAB version 9.10 (2021a) - older versions
matlab-9.9
,matlab-9.8
,matlab-9.7
, ... - more older versions are also available (you can list modules by
module avail matlab
) - Notice: This application use or needs GUI – graphical interface. To use the application in graphical mode see Remote desktop or X-Window.
- through web browser as as interactive job via OnDemand service
Documentation
From the MATLAB command window one can use the command help to get help with a a particular command, e.g.
>> help rand
in the desktop environment one can use also the command doc
>> doc rand
Online documentation is available at MATLAB Product Documentation.
Supported platforms
Current version supports the following operating systems: Solaris, HP-UX, Linux, MacOS-X and MS Windows, older versions are available also on: Digital Unix (Tru64), Irix and AIX.
Homepage
Tips and detailed HOWTO
Matlab as interactive job
Interactive regime brings no significant speed-up comparing to running MATLAB locally on your machine unless parallelism is used. Interactive regime is recommended for development of your code and its testing.
1. OnDemand interface
This is the most easy and straighforward way to run Matlab in graphical mode. Follow the OnDemand tutorial.
2. Interactive job with remote desktop
If you prefer to stay with command line or if, for some reason, you cannot use web browser, it is possible to run Matlab as interactive job and get the graphical output to your screen by configuring remote desktop (recommended) or X-Window. Follow the tutorials to learn how to setup the graphical connection.
After the graphical connection is set up, Matlab can be run from the provided menu or by typing the following code into a terminal window:
$ module add matlab
$ matlab
3. Interactive job without GUI
It is possible to run MATLAB in the text regime only, when the graphical mode is not necessary (you may even create figures, work with them, save them to a disk, though they are not visible). The relevant keywords for this case are:
$ matlab -nosplash -nodisplay -nodesktop
Matlab as batch job
If you do not need the graphical environment, it is is possible to run MATLAB in batch regime. Create batch script myjob.sh with the following contents:
#!/bin/bash
# set PATH to find MATLAB
module add matlab
# go to my working directory
cd $HOME/matlab/
# run MATLAB
matlab -nosplash -nodesktop -nodisplay -r "myFunction()"
# or in a different way:
#matlab -nosplash -nodesktop -nodisplay < myFunction.m > output.txt
Put all your MATLAB files (your *.mat and *.m files) to the directory $HOME/matlab/ and submit this shell script like
qsub -l select=1:ncpus=10:mem=1gb -l matlab=1 myjob.sh
Batch jobs are useful especially if you want to run more jobs in parallel or if you do not want to block your local machine with running jobs.
Turn off Java Virtual Machine (JVM)
MATLAB uses its own java virtual machine for the desktop environment. Many jobs can be sped-up by turning the JVM off (option matlab -nojvm
). Keep in mind, however, that some internal functions and toolboxes (e.g. Distributed Computing Toolbox) need Java.
Start Matlab with Maple symbolic environment
Since the Maple's symbolic environment is not fully compatible with the Matlab's symbolic environment, the Matlab symbolic environment is used by default when starting the Matlab within MetaCentrum environment.
In case the Maple symbolic environment is needed, it must be required explicitly:
$ module add matlab
$ matlab-sym-maple # the options of this command are the same as for the original 'matlab' command
# an alternative -- the 'matlab-sym-matlab' command -- explicitly requires the Matlab symbolic environment
Distributed and parallel jobs in MATLAB
MATLAB now supports distributed and parallel jobs on multiprocessor machines using the Parallel Computing Toolbox (the name of the licence is Distrib_Computing_Toolbox) and on clusters using the MATLAB Distributed Computing Server (name of the licence is MATLAB_Distrib_Comp_Engine).
Parallel Matlab computations in MetaCentrum
To prepare an environment for parallel computations, it is necessary to initialize a parallel pool of so-called workers using the function parpool
(called matlabpool
in previous versions). This standard initialization requires to specify an amount of workers to initialize; moreover, thanks to shared filesystems, it may also result in a collision when trying to initialize several pools simultaneously.
- To make the initialization of parallel pool easier as well as to cope with the collision problems, we have prepared a function
MetaParPool
:
MetaParPool('open'); % initializes parallel pool (returns the number of initialized workers)
...
x = MetaParPool('size'); % allows to discover the size of parallel pool (returns the number of workers)
% may be called as MetaParPool('info'); as well
...
% a computation using parfor, spmd and other Matlab functions
...
MetaParPool('close'); % closing the parallel pool
- Notes:
- the function automatically detects the number of cores assigned to a job -- the size of parallel pool is always automatically set based on resources assigned to a job
- it is necessary to ask for N computing cores on a single node (
qsub -l select=1:ncpus=N ...
)
- it is necessary to ask for N computing cores on a single node (
- to make parallel computations using this function, there are 1 Matlab license and 1 Distributed Computing Toolbox license necessary
- a reservation can be thus performed via
-l matlab=1,matlab_Distrib_Computing_Toolbox=1
-- see #Tips
- a reservation can be thus performed via
- an example of parallel computation using the
MetaParPool
function can be found in the/software/matlab-meta_ext/examples
directory (fileexample-parallel.m
shows the Matlab input file itself while the filerun_parallel.sh
shows an example startup script).
- the function automatically detects the number of cores assigned to a job -- the size of parallel pool is always automatically set based on resources assigned to a job
Distributed Matlab computations in MetaCentrum
Distributed computations in Matlab are commonly performed via parametrization of the internal scheduler (of type torque) -- once a computation is ready, the internal scheduler starts new common PBS jobs, which perform the required functionality after their startup (see more information at Distributed computations using Torque)dtto. However, such behaviour may not be comfortable -- the master process (preparing the distributed computation) may wait for an undefined time to finish the created PBS jobs; also, the parametrization of the internal scheduler as well as nodes' specification may also not be convenient.
- To make the preparation of the distributed pool easier, we have prepared a function
MetaGridPool
:
js = MetaGridPool('open'); % initializes distributed pool (returns the internal scheduler for preparation of the computation)
...
x = MetaGridPool('size'); % returns the size of the initialized pool (number of workers)
MetaGridPool('info'); % shows detailed information about the initialized pool
...
% distributed computation (functions createJob, createTask, wait, etc.)
...
MetaGridPool('close'); % closes the distributed pool
- Notes:
- the function automatically detects the number of nodes and cores assigned to a job -- the size of the pool is always set based on resources really reserved for a job (the number of workers is decreased by 1 -- a single core is reserved for the master process)
- it is necessary to ask for X computing nodes and Y computing cores (
qsub -l select=X:ncpus=Y ...
)
- it is necessary to ask for X computing nodes and Y computing cores (
- to perform distributed computations, there are 1 Matlab license, 1 Distributed Computing Toolbox license and N-1 MATLAB Distributed Computing Engine licenses necessary (N denotes the number of requested cores)
- a reservation can be thus performed via
-l matlab=1,matlab_Distrib_Computing_Toolbox=1,matlab_MATLAB_Distrib_Comp_Engine=7
- a reservation can be thus performed via
- an example of distributed computation using the
MetaGridPool
function can be found in the/software/matlab-meta_ext/examples
directory (fileexample-distributed.m
shows the Matlab input file itself while the filerun_distributed.sh
shows an example startup script).
- the function automatically detects the number of nodes and cores assigned to a job -- the size of the pool is always set based on resources really reserved for a job (the number of workers is decreased by 1 -- a single core is reserved for the master process)
CPU usage
Depending on the structure of script and functions being called, Matlab may use more CPUs than granted by the scheduler. This may result in your job being killed by the scheduler.
Incorrect CPU usage can be prevented by resorting to one of the extremes:
1. Force Matlab to work on one CPU only by using --singleCompThread option
qsub ... ncpus=1 ... matlab ... -nojvm --singleCompThread ...
Obvious disadvantage is no speedup due to parallelization. On poorly (or partly) parallelized codes, or if the 1-CPU-speed-only is not an issue, --singleCompThread
is often a good choice.
2. Reserve whole computational node by exclhost option
qsub -l select=1:ncpus=1 -l place=exclhost
This requirement is done on the PBS level and essentially asks the PBS to grant whole computational node to your job only. Being so, even if Matlab code uses all' available CPUs, your job won't be killed.
On the other hand, exclhost
is a large requirement. If the grid is busy, jobs with exclhost
may wait for a long time (even several days!) to run.
exclhost
may be effective in case you have a number of Matlab jobs which use parallelization only in some critical part. You can group them into one PBS job and run them all
at once in background, e.g.
qsub ... place=exclhost ... (cd Dataset1; matlab ... ) & (cd Dataset2; matlab ... ) & ... (cd DatasetN; matlab ... )
3. Find the optimal number of CPUs using cgroups
If you need some level of parallelizations, but don't want to use exclhost
, you can limit number of available CPUs by using cgroups, which is Linux kernel tool. Only some machines are able to use cgroups.
qsub -l select=1:ncpus=N:cgroups=cpuset ... # N is the the optimal number of CPUs
Finding the optimal number (N) of CPUs can be tricky and sometimes reduced to trial-and-error approach, especially if your code calls external libraries. You can use top command to watch the CPU load.