Quick Start

This is a "quick start" introduction into using the HPC clusters at the University of Maryland. This covers the general activities most users will deal with when using the clusters.

MATLAB DCS Users: The system interaction for users of Matlab Distributed Computing Server is rather different from that of other users of the cluster, and so is not covered in this document. Please see Matlab DCS Quick Start.

  1. Prequisites
  2. Logging into one of the login nodes
  3. Creating a job script
  4. Submitting a job
  5. Monitoring job status
  6. Monitoring your allocation

Prequisites for the Quick Start

This quick start assumes that you already

  1. have a TerpConnect/Glue account (REQUIRED for the Deepthought clusters, advisable for the others)
  2. have an account on/access to the cluster (i.e., have an allocation or have been granted access to someones allocation)
  3. know how to use ssh
  4. have at least a basic familiarity with Unix

If not, follow the above links before proceeding with this quick start.

Logging into one of the login nodes

All of the clusters have at least 2 nodes available for users to log into. From these nodes you can submit and monitor your jobs, look at results of the jobs, etc.

WARNING
DO NOT RUN computationally intensive processes on the login nodes!!!. These are in violation of policy, interfere with other users of the clusters, and will be killed without warning. Repeated offenses can lead to suspension of your privilege to use the clusters.

For most tasks you will wish to accomplish, you will start by logging into one of the login nodes for the appropriate cluster. These are:

Cluster Login Node Examples
Deepthought login.deepthought.umd.edu ssh johndoe@login.deepthought.umd.edu
ssh -l johndoe login.deepthought.umd.edu
Deepthought2 login.deepthought2.umd.edu ssh johndoe@login.deepthought2.umd.edu
ssh -l johndoe login.deepthought2.umd.edu
MARCC/Bluecrab gateway2.marcc.jhu.edu ssh "johndoe@umd.edu"@gateway2.marcc.jhu.edu
ssh -l johndoe@umd.edu gateway2.marcc.jhu.edu

Note: For the MARCC/Bluecrab cluster, note that since your username is something like johndoe@umd.edu, you will need to either use the -l argument to the standard Unix ssh or quote the username for the @ format, as shown above.

See the section on logging into the clusters for more information.

Creating a job script

Next, you'll need to create a job script. This is just a simple shell script that will specify the necessary job parameters and then run your program.

Here's an example of a simple script, we'll call test.csh:

#!/bin/bash
#SBATCH -t 1
#SBATCH -n 4
#SBATCH --mem-per-cpu=128
#SBATCH --share

. ~/.profile
module load python/2.7.8

hostname
date

The first line, the shebang, specifies the shell to be used to run the script. Note that you must have a shebang specifying a valid shell in order for Slurm to accept and run your job; this differs from Moab/PBS/Torque which ignores the shebang and runs the job in your default shell unless you gave an option to qsub for a different shell.

The next three lines specify parameters to the scheduler.

The first, -t, specifies the maximum amount of time you expect your job to run. It can take various forms, but usually you will want to give minutes, hours:minutes:seconds, or days-hours. You should always set a reasonable wall time limit; this will help improve utilization of the cluster and reduce the amount of time your job will wait in the queue. To encourage this, the default wall time limit is rather short. In this example, we specified a wall time limit of 1 minute; normally this would be much longer, but this is a trivial job.

See the section on specifying the walltime limit for more information.

The second line, -n, tells the scheduler on how many tasks/cores your job will have (by default Slurm assigns a distinct core to each task). We do not specify how Slurm should distribute these cores across machines, so Slurm can distribute them however it sees fit. That is usually sufficient for many MPI jobs, and there are other options that allow for very detailed specification on how the cores should be distributed, as briefly described here and in the examples page.

In this example, we are requesting 4 cores (which is way more than needed for this trivial example). Most likely we will get all 4 cores on a single node, but that is NOT guaranteed. We could possibly get one core on each of 4 nodes, or some allocation of 4 cores on 2 or 3 nodes.

See the section on specifying the node/core requirements for more information.

The third line, --mem-per-core, tells the scheduler on how much memory to allocate for your jobs. This particular form, --mem-per-core=N reserves the requested amount of memory (N MB) per CPU core assigned. A similar form, --mem=N, reserves the requested amount of memory (N MB) for the entire job. The --mem-per-core is usually more convenient. Nodes on the Deepthought cluster should have at least 1 GB/core. On the Deepthought2 cluster, nodes have at least 6 Gb/core. For the MARCC/Bluecrab cluster, nodes have at least 5 GB/core.

In this example, we are requesting 128 MB per core, for a total of 512 MB for our 4 core job. If we used --mem=128 we would get a total of 128 MB (or effectively 32 MB per core), which for this trivial job is still way more than is actually needed.

See the section on specifying the memory requirements for more information.

The fourth line, --share, is valid for the Deepthought HPC clusters, and states that we are willing to share a node with other jobs. E.g., on Deepthought2, most nodes have 20 cores; by using --share mode, if all of our cores are assigned to one such node, Slurm will reserve 4 cores for us, but can assign the other 16 cores to other jobs which our job is running. The opposite is --exclusive, which prevents other jobs from running on the same node(s) as the exclusive job. If our sample job was --exclusive and assigned to a 20 core node, the other 16 cores would be unassigned and idle while the job ran.

NOTE: exclusive jobs get charged for both the cores they use AND for the cores they prevent from being used by anyone else due to the exclusive status. E.g., if the example job was --exclusive and assigned a 20 core node, it would accrue charges for 20 cores for as long as it ran.

By default, jobs requesting only a single core are run in --share mode, and those requesting more than one core are run in --exclusive mode. But you can override this with the --share and --exclusive flags.

See the section on specifying whether other jobs can be on the same node for more information.

For the MARCC/Bluecrab cluster, the shared/exclusive mode is determined by the partition your job is submitted to, and in general you will need to specify a partition depending on the type of job you wish to run. By default, jobs will go to the shared partition, which is suitable for serial jobs or jobs doing shared memory parallelization. "Shared" partition jobs are restricted to a single node, and as the name implies, run in shared mode. The parallel partition is suitable for parallel jobs requiring multiple nodes, and enforces --exclusive mode. On MARCC/Bluecrab, you should just specify the partition (with the -p option), and not specify the shared/exclusive mode, for example:

#!/bin/bash
# Bluecrab example
#SBATCH -t 1
#SBATCH -n 4
#SBATCH -p shared

. ~/.profile
module load python/2.7.9

hostname
date

See the section on MARCC/Bluecrab partitions for more information.

Users of the two Deepthought HPC clusters generally should NOT be specifying a partition (the only real exceptions being if you wish to use the debug or the scavenger partitions); Slurm will automatically select the appropriate partition for you.

It is advisable to include at least these four above options (wall time limit, number of cores, memory and either exclusivity or partition depending on the cluster) for all jobs, either in the job script as shown, or on the sbatch command line (see for general information on providing options to the sbatch command). There are many other possible arguments to the sbatch command, the more commonly used ones are described here.

The remaining lines in the file are just standard commands, you will replace them with whatever your job requires. In this case once the job runs, it will print out the time and hostname to the output file. The script will be run in whatever shell is specified by the shebang on the first line of the script. NOTE: unlike with the Moab scheduler, you MUST provide a valid shebang on the first line.

Note that when your job starts, your job script is executed on the first node assigned to your job. The list of nodes assigned to your job, etc. are available in Slurm environmental variables, but Slurm does not do anything to parallelize your job. Your script is responsible for farming out tasks to the different cores/nodes that are part of the job. Normally, a parallel application will handle that, or you issue your MPI-aware code with mpirun which handles that.

See the section on running MPI jobs for more information.

In particular, note the the example given is BAD. Although it requests 4 cores, all the commands listed (hostname, date) are single core commands, so 3 of the requested cores will actual be idle while the job is running. Since this job is just a simple example and will finish in seconds, that is not a big issue in this case. But in general, simply submitting serial code as a sbatch job requesting more than one core DOES NOT parallelize a job.

For users of the Deepthought HPC clusters: If your job script used bash and that is NOT your default shell, you should begin the code section of your script with

. ~/.profile
to set up your environment properly. In particular, this sets up the module command.

Generally, this should be followed by module loads of whatever modules your job requires.

See the section on using the module command for more information.

It is recommended that you include the relevant module commands for a job in the actual job script, as opposed to relying on modules loaded by your dot files.

For more information than is suitable for a quick start document, follow one or more of the links below:

Submitting a job

Now that you have a job script, you need to submit the job to the cluster with the sbatch command. For example,

login-1:~: sbatch test.sh 
Submitted batch job 13222

The number that is returned to you is the identifier for the job, and you should use that anytime you want to find out more information about your job, and you should include this number if you are opening a help ticket about a job.

WARNING
Do NOT start jobs from your home directory. It is NOT optimized for heavy I/O.

At this point, your job has been placed in the queue, and will wait its turn for resources to be available. Depending on how heavily used the cluster is at that time, and how many resources you are requesting, your job might start within minutes or it might wait for hours or even days. (And this is assuming that there are sufficient funds in the allocation, etc.) See the FAQ for tips on how to educe the amount of time your job spends waiting in the queue..

Once resources become available, Slurm will assign resources to your job, including one or more cores on one or more nodes. A shell process will start on the first core of the first node assigned, and your script will run. Normally, your script will start any other tasks on the same or on other nodes as needed.

The standard output and standard error streams will be directed to a file, by default slurm-NNNN in the directory where you started the job, where the NNNN is the job number as described above. See the section on specifying output options for more information.

WARNING
Do NOT start jobs from your home directory. It is NOT optimized for high I/O. Use lustre or scratch space. See the section on storage for more information on storage options.

Output from your job can be viewed in the above specified file shortly after it starts running (assuming it has output something). This can be used to check the status of your job, although it is advisable if your code generates a lot of output to redirect it to another file. See the section on storage for more information on storage options.

For our trivial example from the last section, when the job completes we should see something like

l:~: cat slurm-13222.out
compute-2-39.deepthought.umd.edu
Wed May 21 18:38:06 EDT 2014

As you can see in the output files above, the script ran and printed the hostname and date as specified by the job script.

Monitoring job status

The basic command for monitoring your jobs' status is the squeue command. Because normally you are only interested in your jobs, it is advisable to add the -u USERNAME flags, to speed up the command and only show your jobs. Replace USERNAME with your username (and remember the @umd.edu on Bluecrab).

For more information on monitoring jobs than is suitable for a quick start document, follow the links below.

Monitoring your allocation

It is often useful to be able to see the status of the cluster as a whole, including information about how busy the cluster is at a given point in time.

The squeue command without any arguments will list all jobs in the queue. This can be overwhelming, however, as there are often many, many jobs.

The sinfo -N command can show you information about the nodes in the cluster. Again, this is a dense text output, so can be difficult to process.

The smap uses ascii graphics to present this information in a more graphical and hopefully more digestible fashion.

The sview uses X11 graphics for an even prettier overview of the cluster.

More information about the commands above can be found in the section on on monitoring the cluster..