Slurm Environmental Variables
When a job scheduled by Slurm starts, it needs to know certain
things about how it was scheduled, etc. E.g., what is it's working
directory, or what nodes were allocated for it. Slurm passes this
information to the job via environmental variables. In addition to
being available to your job, these are also used by programs like
mpirun to default values. This way, something like
mpirun already knows how many tasks to start and on which
nodes, without you needing to pass this information explicitly.
The following is a list of commonly used variables that are set by Slurm for each job, along with a brief description, sample value, and the nearest analog for PBS/Torque based schedulers. A full list of the variables set by Slurm for each job is available in the sbatch man page.
|Slurm Job Environment Variables|
|Slurm Variable Name||Description||Example values||PBS/Torque analog|
|$SLURM_JOBID||Deprecated. Same as SLURM_JOB_ID|
|$SLURM_JOB_NODELIST||Nodes assigned to job||compute-b24-[1-3,5-9],compute-b25-[1,4,8]||cat $PBS_NODEFILE|
|$SLURM_SUBMIT_HOST||Host submitted from||login-1.deepthought2.umd.edu||$PBS_O_HOST|
|$SLURM_JOB_NUM_NODES||Number of nodes allocated to job||2||$PBS_NUM_NODES|
|$SLURM_CPUS_ON_NODE||Number of cores/node||8,3||$PBS_NUM_PPN|
|$SLURM_NTASKS||Total number of cores for job???||11||$PBS_NP|
|$SLURM_NODEID||Index to node running on
relative to nodes assigned to job
|$PBS_O_VNODENUM||Index to core running on
|$SLURM_PROCID||Index to task relative to job||0||$PBS_O_TASKNUM - 1|
Scontrol and hostnames/hostlists
The list of nodes allocated to a job is presented in a compact notation,
in which square brackets (i.e.
used to delimit lists and/or ranges of numeric values. This compact
form saves space in the environment and in displays, but is often
not the most useful in scripts, where a fully expanded list might
be more convenient.
To convert between the two formats, there are subcommands of the
scontrol command, e.g.
#Example of using scontrol show hostnames, using example from above login-2:~: scontrol show hostnames 'compute-b24-[1-3,5-9],compute-b25-[1,4,8]' compute-b24-1 compute-b24-2 compute-b24-3 compute-b24-5 compute-b24-6 compute-b24-7 compute-b24-8 compute-b24-9 compute-b25-1 compute-b25-4 compute-b25-8 login-2:~: #And now for the reverse login-2:~: scontrol show hostlist 'compute-b24-1,compute-b24-2,compute-b24-3,compute-b24-5,compute-b24-6,compute-b24-7,compute-b24-8,compute-b24-9,compute-b25-1,compute-b25-4,compute-b25-8' compute-b24-[1-3,5-9],compute-b25-[1,4,8] login-2:~: