Matlab

Contents

Summary and Version Information

Package Matlab
Description Matlab
Categories Numerical Analysis
Version Module tag Availability* GPU
Ready
Notes
2009b matlab/2009b Non-HPC Glue systems
All OSes
Y
2010b matlab/2010b Non-HPC Glue systems
Evergreen HPCC
Linux
Y
2011a matlab/2011a Non-HPC Glue systems
Bswift HPCC
Linux
Y
2011b matlab/2011b Non-HPC Glue systems
Evergreen HPCC
Bswift HPCC
Linux
Y
2012b matlab/2012b Non-HPC Glue systems
Bswift HPCC
Linux
Y
2013b matlab/2013b Non-HPC Glue systems
RedHat6
Y
2014a matlab/2014a Non-HPC Glue systems
RedHat6
Y
2014b matlab/2014b Non-HPC Glue systems
Deepthought HPCC
Bswift HPCC
Deepthought2 HPCC
RedHat6
Y
2015a matlab/2015a Non-HPC Glue systems
RedHat6
Y
2015b matlab/2015b Non-HPC Glue systems
Deepthought HPCC
Deepthought2 HPCC
RedHat6
Y
2016a matlab/2016a Non-HPC Glue systems
Deepthought HPCC
Deepthought2 HPCC
RedHat6
Y
2016b matlab/2016b Non-HPC Glue systems
Deepthought HPCC
Deepthought2 HPCC
RedHat6
Y
2017a matlab/2017a Non-HPC Glue systems
Deepthought HPCC
Deepthought2 HPCC
RedHat6
Y

Notes:
*: Packages labelled as "available" on an HPC cluster means that it can be used on the compute nodes of that cluster. Even software not listed as available on an HPC cluster is generally available on the login nodes of the cluster (assuming it is available for the appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters). This is due to the fact that the compute nodes do not use AFS and so have copies of the AFS software tree, and so we only install packages as requested. Contact us if you need a version listed as not available on one of the clusters.

In general, you need to prepare your Unix environment to be able to use this software. To do this, either:

  • tap TAPFOO
OR
  • module load MODFOO

where TAPFOO and MODFOO are one of the tags in the tap and module columns above, respectively. The tap command will print a short usage text (use -q to supress this, this is needed in startup dot files); you can get a similar text with module help MODFOO. For more information on the tap and module commands.

For packages which are libraries which other codes get built against, see the section on compiling codes for more help.

Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.

In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.

Running a MATLAB script from the command line

While most people use MATLAB interactively, there are times when you might wish to run a MATLAB script from the command line. Or from within a shell script. Usually in this situation, you have a file containing MATLAB commands, one command per line, and you want to start up MATLAB, run the commands in that file, and save the output to another file, and you do not want the MATLAB GUI starting up (often times, the process will be running in a fashion where there might not be a screen readily available to display the GUI stuff).

WARNING
If you are running Matlab jobs on one of the Deepthought high-performance computing clusters, please include a #SBATCH -L matlab directive near the top of your job script. This is because we have been having issues with HPC users depleting the campus Matlab license pool. The above directive will ask Slurm for a matlab license, which will be used to throttle the number of simultaneous Matlab jobs running on the clusters. If all the matlab users on the cluster abide by this policy, hopefully there will be no more issues with license depletion. If such an issue occurs, we will regrettably have to kill some matlab jobs (starting with those that did NOT request a license) to free up licenses. We are hoping in the next several months to obtain a truly unlimited matlab license on campus, but until then we ask that HPC users include the above directive in their matlab jobs.

This can be broken down into several distinct parts:

  1. Get MATLAB to run without the GUI, etc.
  2. Get MATLAB to start running your script, and exit when your script is done.
  3. Get the output of the MATLAB command saved to a file.

The first part is handled with the following options to be passed to the MATLAB command: -nodisplay and -nosplash. The first disables the GUI, the latter disables the MATLAB splash screen that gets displayed before the GUI starts up.

The second step is handled using the -r option, which specifies a command which MATLAB should run when it starts up. You can give it any valid MATLAB command, but typically you just want to tell it to read commands from your file. And then you want to tell it to exit; otherwise it will just sit at the prompt waiting for additional commands. One reason to keep it simple like that is that the command string has to be quoted to keep the Unix shell from interpretting it, and that can get tricky for complicated commands.

Typically, you would give an argument like matlab -r "run('./myscript.m'); exit" (and you would include the -nodisplay and -nosplash arguments before the -r if you wanted to disable the GUI as well); where myscript.m is your script file, and is located in the current working directory. The exit causes MATLAB to exit once the script completes.

The third part is handled with standard Unix file redirection.

Putting it all together, if you had a script myscript.m in the directory ~/my-matlab-stuff, and you want to run it from a shell script putting the output in myscript.out in the same directory, you could do something like

#!/bin/tcsh

module load matlab
cd ~/my-matlab-stuff
matlab -nodisplay -nosplash -r "run('~/myscript.m'); exit" > ./myscript.out

MATLAB and HPC

Mathworks currently provides two products to help with parallelization:

  1. Parallel Computing Toolkit (PCT): This provides support for parallel for loops (the parfor command), as well some CUDA support for using GPUs. However, without the MATLAB Distributed Compute Server, there are limits on the number of workers that can be created, as well as that all workers must be on the same node.
  2. MATLAB Distributed Computing Server (MDCS): This extends MATLAB desktop workflows to the cluster hardware, and allows you to submit MATLAB jobs to the cluster without having to learn anything about the cluster command line interface.

In addition, some of the built-in linear algebra and numerical functions are multithreaded as well.

WARNING
If you are running Matlab jobs on one of the Deepthought high-performance computing clusters, please include a #SBATCH -L matlab directive near the top of your job script. (This is NOT needed for Matlab DSC jobs). This is because we have been having issues with HPC users depleting the campus Matlab license pool. The above directive will ask Slurm for a matlab license, which will be used to throttle the number of simultaneous Matlab jobs running on the clusters. If all the matlab users on the cluster abide by this policy, hopefully there will be no more issues with license depletion. If such an issue occurs, we will regrettably have to kill some matlab jobs (starting with those that did NOT request a license) to free up licenses. We are hoping in the next several months to obtain a truly unlimited matlab license on campus, but until then we ask that HPC users include the above directive in their matlab jobs.

Built-in multithreaded functions

A number of the Matlab built-in functions, especially linear algebra and numerical functions, are multithreaded and will automatically parallelize in that way.

This parallelization is shared memory, via threads, and so is restricted to within a single compute node. So normally your job submission scripts should explicitly specify that you want all your cores on a single node.

For example, if your matlab code is in the file myjob.m, you might use a job submissions script like:

#!/bin/bash
#SBATCH -t 2:00
#SBATCH -N 1
#SBATCH -n 12
#SBATCH -mem-per-cpu 1024
#SBATCH -L matlab

. ~/.profile
module load matlab

matlab -nodisplay -nosplash -r "run('myjob.m'); exit" > myjob.out

and your matlab script should contain the line

	maxNumCompThreads(12);
somewhere near the beginning. This restricts Matlab to the requested number of cores --- if it is omitted, Matlab will try to use all cores on the node.

MATLAB Parallel Toolbox

The MATLAB Parallel Toolbox allows you to parallelize your MATLAB jobs, to take advantage of multiple CPUs on either your desktop or on an HPC cluster. This toolbox provides parallel-optimized built-in MATLAB functions, including the parfor parallel loop command.

A simple example matlab script would be


% Allocate a pool
% We use the default pool, which will consist of all cores on your current
% node (up to 12 for MATLABs before R2014a)
parpool
% For MATLAB versions before R2013b, use "matlabpool open"


%Pre-allocate a vector
A = zeros(1,100000)
xfactor = 1/100;

% Assign values in a parallel for loop
parfor i = 1:length(A)
	A(i) = xfactor*i*sin(xfactor*i);
end

Assuming the above MATLAB script is in a file ptest1.m in the directory /lustre/payerle/matlab-tests, we can submit it with the following script to sbatch:

#!/bin/tcsh
#SBATCH -n 20
#SBATCH -N 1
#SBATCH -L matlab

module load matlab

matlab -nodisplay -nosplash \\
	-r "run('/lustre/payerle/matlab-tests/ptest1.m'); exit" \\
	> /lustre/payerle/matlab-tests/ptest1.out

You would probably want to add directives to specify other job submission paremeters, including

NOTE: It is important that you specify a single node in all of the above, as without using Matlab Distributed Computing Server the parallelization above is restricted to a single node.

MATLAB Distributed Computing Server

The MATLAB Distributed Computing Server (MDCS) allows you to extend your MATLAB workflows from your desktop to an HPC cluster without having to learn the details of submitting jobs to the cluster.

The initial documentation from the consultant from Matlab are below:

Before using Matlab Distributed Compute Server (Matlab DCS) the first time on your computer, you will need to perform the following steps. You will need to perform these steps once on each computer you plan to run Matlab on and submit jobs via Matlab DCS to the Deepthought2 cluster. You will also need to repeat this step on a computer if you intend to use Matlab DCS with a new version of Matlab. It should only be necessary to do the following steps once per system/Matlab version; however, it should not hurt anything to repeat the process.

  1. Most of the configuration is contained in one of these two files: The two files are the same, you only need one or the other. The zipfile is probably most convenient for Windows systems, the tarball for linux systems.
    1. Determine the userpath directory for Matlab on your workstation. To do this, run the userpath command in Matlab. Typically, this will be one of
      • My Documents/MATLAB or Documents/MATLAB on Windows systems, or
      • ~/Documents/MATLAB or $matlab/toolbox/local on Linux systems.
    2. Untar/unzip the tarball/zipfile downloaded above and place the contents in the userpath directory determined above.
    3. You will also a profile settings file. You need to select the one that matches the version of Matlab running on your workstation. You can download/install multiple settings files if desired (which might be useful if you run different versions of Matlab on the same system). The files are: (depending on your browser, you probably need to do something like right click and "Save link as ..." to save these as a file): If your version of Matlab is not listed, you can contact system staff to see if Matlab DCS will work with that version of Matlab.
    4. Copy the settings file from above (deepthought2_remote_MATLAB_VERSION.settings) into the userpath directory as obtained previously. You can have multiple settings files in that directory.

    Using Matlab DCS with the DT2 Cluster

    The following is a quick guide to using Matlab DCS to submit jobs to the DT2 cluster.

    1. To start, from the matlab prompt, run the command configCluster. This will do some basic setup, as well as ask for your username on the Deepthought2 cluster, and default jobs to running on the cluster instead of locally. If multiple profile settings for the version of Matlab you are running on your workstation are found, you will be prompted to select one. If no profile settings for the version of Matlab you are running are found, it will offer you a list of what was found, but chances are they will not work. Go to the previous section and download a cluster profile settings file for the correct Matlab version and try again.
    2. You now need to define a "cluster" to submit jobs to. This holds the information about the parallel workers, etc. For most cases, it will suffice to enter a command like:
    3. >> c = parcluster;

      You can choose whatever variable you like instead of c, but if so be sure to change it in the following examples as well.

    4. You can then create and submit jobs to be run on the remote cluster. The following is a simple example:
      >> j = c.batch(@pwd, 1, {} );
      
      additionalSubmitArgs =
      
      --ntasks 1 --licenses=mdcs:1
      
      >> j.wait
      >> j.fetchOutputs(:)
      
      ans =
      
      /a/fs-3/export/home/deepthought2/mltrain
      
      >>
      >> j.delete
      >>

      The variable j holds the "job"; you can use whatever variable you like. In this case, the "job" is created when we create a batch job on our parcluster c. For this example, we are simply running the builtin pwd command; in most cases you would probably be including a string with the name of an user defined function (e.g. the name of a "*.m" file without the ".m" extension). The 1 in the batch command means that the function is expected to return 1 argument. The braces {} contain a list of input values to the function; in this case, the pwd does not take input argument, so we do not provide any.

      The submission scripts will print the additionalSubmitArgs string. These are the arguments that will be provided to the Slurm sbatch command; the web documentation on submitting jobs will have more information. As you gain experience with the system, you may wish to examine this to ensure that the job is being submitted correctly.

      The first time you submit a job to Deepthought2 in a particular Matlab session, a pop-up message will be displayed asking if you wish to "Use an identity file to login to login.deepthought2.umd.edu?". If you answer "No", you will be prompted for your password on the Deepthought2 cluster, and this is the recommended response for new users. Answering "Yes" requires one to setup RSA public key authentication on the Deepthought2 login nodes; you will be prompted to provide the location of the identity file and asked if the file requires a passphrase. In all cases, Matlab will remember this information (your password, or the location and/or passphrase to the identity file) for the remainder of your Matlab session.

      When you issue the batch, a job is submitted to the scheduler to run on the Deepthought2 compute nodes. Depending on how busy the cluster is, the job might or might not start immediately, and even if it starts immediately, it will generally (except in overly simple test cases such as this) take a while to run. The j.wait will not return until the job is completed. You might instead wish to use the c.Jobs command to see the state of all of your jobs. Although you can submit other jobs (be sure to store the job in different variables) and perform other calculations while your job(s) are pending/running, you cannot examine their output until they complete.

      To examine the results of a job (after it has completed), you can use the j.fetchOutputs(:) function as shown in the example. In the above example, you can see that it returned the path to the home directory of the Matlab test login account that it was run from. If the job does not finish successfully, you probably will not be able to get anything useful from the fetchOutputs function. In such cases, you should look at the error logs (which can be lengthy) using the getDebugLog function. There are separate logs for each worker in the job, so you will need to do something like:

      >> j.Parent.getDebugLog(j.Tasks(1))

      Note: The fetchOutputs function will only return the values returned from the function you called; data which has been written to files will not be returned. For such data, you will need to manually log into the Deepthought2 cluster to retrieve the information.

      The above example is unrealistically simple. In practice, you will generally need to set some more job parameters --- although Matlab DCS hides some of the complexity of submitting jobs to an HPC cluster from the user, it cannot hide all of it. In general, the settings for your job will be obtained from the ClusterInfo object in Matlab. You can use the command ClusterInfo.state() to see all of the current settings, and in general the commands ClusterInfo.getFOO() and ClusterInfo.setFOO(VALUE) can be used to query the value of a particular setting FOO, or set such to VALUE. Notable fields are:

      • WallTime: this sets the maximum wall time for the job. If not set, the default is 15 minutes, which is probably too short for real jobs. This can be given using one of the following formats:
        • MINUTES
        • DAYS-HOURS:MINUTES
        • DAYS-HOURS
        • HOURS:MINUTES:SECONDS
      • MemUsage: this sets the memory per CPU-core/task to be reserved. This should be given as a number of MB per core.
      • ProjectName: this specifies the allocation account to which the job will be charged. Your default allocation account will be charged if none is specified.
      • QueueName: this specifies the partition the job should run on. Normally you will not wish to set this unless you wish to run on the debug or scavenger partitions.
      • UseGpu: If you wish for your job to use GPUs, you should set this to the number of GPUs to use. That will cause Slurm to schedule your job on a node with GPUs; additional work may be needed to get Matlab to actually use the GPUs.
      • EmailAddress: if set, it will cause Slurm to send email to the address provided on all job state changes. The default is not to send any email.
      • Reservation: if set, the job will use the specified reservation.
      • UserDefinedOptions: This is a catch-all for any other options you need to provide to Slurm for your job. You should just present sbatch flags as you would on the command line. E.g., to specify that you wish to allow other jobs to run on the same node as your job, you can provide the value --share. You can provide multiple Slurm arguments in this string by just putting spaces between the arguments in the string.
      • The following example shows how to set a walltime of 4 hours and request 4 GB/core (4096 MB/core):

        >> ClusterInfo.setWallTime('4:00:00')
        >> ClusterInfo.setMemUsage('4096')