Slurm Environmental Variables

When a job scheduled by Slurm starts, it needs to know certain things about how it was scheduled, etc. E.g., what is it's working directory, or what nodes were allocated for it. Slurm passes this information to the job via environmental variables. In addition to being available to your job, these are also used by programs like mpirun to default values. This way, something like mpirun already knows how many tasks to start and on which nodes, without you needing to pass this information explicitly.

The following is a list of commonly used variables that are set by Slurm for each job, along with a brief description, sample value, and the nearest analog for PBS/Torque based schedulers. A full list of the variables set by Slurm for each job is available in the sbatch man page.

Slurm Job Environment Variables
Slurm Variable Name Description Example values PBS/Torque analog
$SLURM_JOB_ID Job ID 5741192 $PBS_JOBID
$SLURM_JOBID Deprecated. Same as SLURM_JOB_ID    
$SLURM_JOB_NAME Job Name myjob $PBS_JOBNAME
$SLURM_SUBMIT_DIR Submit Directory /lustre/payerle/work $PBS_O_WORKDIR
$SLURM_JOB_NODELIST Nodes assigned to job compute-b24-[1-3,5-9],compute-b25-[1,4,8] cat $PBS_NODEFILE
$SLURM_SUBMIT_HOST Host submitted from login-1.deepthought2.umd.edu $PBS_O_HOST
$SLURM_JOB_NUM_NODES Number of nodes allocated to job 2 $PBS_NUM_NODES
$SLURM_CPUS_ON_NODE Number of cores/node 8,3 $PBS_NUM_PPN
$SLURM_NTASKS Total number of cores for job??? 11 $PBS_NP
$SLURM_NODEID Index to node running on
relative to nodes assigned to job
0 $PBS_O_NODENUM
$PBS_O_VNODENUM Index to core running on
within node
4 $SLURM_LOCALID
$SLURM_PROCID Index to task relative to job 0 $PBS_O_TASKNUM - 1

Scontrol and hostnames/hostlists

The list of nodes allocated to a job is presented in a compact notation, in which square brackets (i.e. [ and ]) are used to delimit lists and/or ranges of numeric values. This compact form saves space in the environment and in displays, but is often not the most useful in scripts, where a fully expanded list might be more convenient.

To convert between the two formats, there are subcommands of the scontrol command, e.g.

#Example of using scontrol show hostnames, using example from above
login-2:~: scontrol show hostnames 'compute-b24-[1-3,5-9],compute-b25-[1,4,8]'
compute-b24-1
compute-b24-2
compute-b24-3
compute-b24-5
compute-b24-6
compute-b24-7
compute-b24-8
compute-b24-9
compute-b25-1
compute-b25-4
compute-b25-8
login-2:~:
#And now for the reverse
login-2:~: scontrol show hostlist 'compute-b24-1,compute-b24-2,compute-b24-3,compute-b24-5,compute-b24-6,compute-b24-7,compute-b24-8,compute-b24-9,compute-b25-1,compute-b25-4,compute-b25-8'
compute-b24-[1-3,5-9],compute-b25-[1,4,8]
login-2:~: