SLURM Guide
- 1 What is SLURM?
- 2 Overview
- 3 Job Submission
- 4 Job Examples
- 5 How to test jobs using the Debug Partition
- 6 RAM Job Submission allocation
- 7 Testing/Debugging code if a job fails
- 8 Monitor CPU and Memory
- 9 Interactive jobs
- 10 Checking the Status of a Job
- 11 How to Release a Job from “JobHeldUser”
- 12 How to Terminate a Job
- 13 Priority Jobs
- 14 Checking the state of nodes and partitions
- 15 Job State Codes
- 16 Job Reason Codes
- 17 Related articles
What is SLURM?
SLURM, or Simple Linux Utility for Resource Management, is a HPC Job Scheduler that helps manage and allocates compute resources to make sure access is distributed fairly between users. For more info, see our glossary.
Overview
To optimally and fairly use the cluster, all application programs must be run using the job scheduler, SLURM.
When you use SLURM's sbatch
command, your application program gets submitted as a "job". To better understand how applications get submitted as jobs, let's review the difference between login nodes and compute nodes.
Login nodes: When you connect to the cluster and see [<YourNetID>@login4 ~]
, you are connected to a single shared computer with all your fellow users, known as the "login node". The purpose of the "login" node is for you to submit jobs, copy data, edit programs, etc. The programs that are allowed to run on login nodes is listed in our usage policy.
Compute nodes: These computers do the heavy lifting of running your programs. However, you do not directly interact with compute nodes. You ask the scheduler for compute nodes to run your application program using SLURM, and then SLURM will find available compute nodes and run your application program on them.
Please do not run computationally-intensive programs on the login nodes. Doing so may slow down performance for other users, and your commands will be automatically throttled or terminated.
Job Submission
First, log in to the cluster:
$ ssh NetID@hpc2.storrs.hpc.uconn.edu
Use
nano
or your favorite text editor to create your job submission script. Here is a very simple job example:[NetID@login4 ~]$ nano myJob.sh #!/bin/bash #SBATCH --ntasks=1 # Job only requires 1 CPU core #SBATCH --time=5 # Job should run for no more than 5 minutes echo "Hello, World" # The actual command to run
Save your submission script and then submit your job
sbatch
:[NetID@login4 ~]$ sbatch myJob.sh Submitted batch job 279934
You can view the status of your job with the
squeue --me
command (described later in this guide:
The output of your job will be in the current working directory in a file named slurm-JobID.out
, where JobID
is the number returned by sbatch
in the example above.
Job Examples
The HPC cluster is segmented into groups of identical resources called Partitions. All jobs submitted to the cluster run within one of these Partitions. If you do not select a Partition explicitly the scheduler will put your job into the default Partition, which is called general
. Each Partition has defined limits for job runtime and core usage, with specific details available on the usage policy page. You can view a list of all partitions and their status by running the sinfo
command.
There is also a knowledge base article referencing the available Partitions and the options that can be used to submit to the partitions located here.
Below are multiple examples of how to submit a job in different scenarios.
Default (general) partition
The example job below requests 48 CPU cores for two hours, and emails the specified address upon completion:
Test (debug) partition
This Partition allows you to request a single node, and run for up to 30 minutes.
MPI-optimized (hi-core) partition
This Partition allows you to request up to 384 cores, and run for up to 6 hours.
Single node (lo-core) partition
The lo-core
partition allow you to request up to 7 days.
Run a single multi-threaded job per node
In some cases, one is running a single multi-threaded program that needs to be spawned on a given node. We need the program to have access to all available CPU cores on that node. To ensure all the CPUs of a given node are allocated to the program on that node, we can use the --ntasks=1
flag combined with the --cpus-per-task=
flag. The value --cpus-per-task=
flag is set to should be equal to the number of cores available on a given node, but that will vary from node to node. We have included a couple of examples below.
AMD-Epyc 128-Core Nodes
AMD-Epyc 64-Core Nodes
General GPU Nodes
Please replace the bash my_script.sh
line with your multi-threaded program command, and please see our Storrs HPC Resources page for info on the number of cores available on nodes of different architectures.
For a more extensive list of flags that can be used with the #SBATCH header or the fisbatch
/srun
command, see this table from our SLURM Cheatsheet.
How to test jobs using the Debug Partition
The debug partition is a great way to troubleshoot and test code before running on a node quickly without running into long wait times.
The above Debug Partition test example shows how to submit to the debug partition, request 4 cores and 1 node to the debug job.
The following example will go into further detail for the Debug Partition and show a different way that can be used to debug code to determine if there is a potential issue or to confirm that the code can run on the HPC hardware at the time of submission.
Different hardware is available to test within the Debug partition, which allows for users to troubleshoot their code on specific architectures.
Skylake node debug test
Test (debug) partition for Skylake nodes: This example will allow you to request a single node, and run for up to 30 minutes on a Skylake node using the max of 36 CPU cores on a Skylake node and run for 5 minutes.
AMD EPYC node debug test
Test (debug) partition for AMD EPYC nodes: This example will allow you to request a single node, and run for up to 30 minutes on an AMD EPYC node using the max of 128 CPU cores on an AMD EPYC node and run for 5 minutes.
RAM Job Submission allocation
There are multiple options that can be used to allocate memory to a SLURM job submission script.
Out Of memory issues are fairly common now, if memory is not specified within a job submission script.
To avoid potential Out of Memory errors, there are a couple of ways to designate a RAM assignment within the job script.
There will be a couple of examples below showcasing different ways to allocate memory to a Slurm job that submits to the general partition.
Full node memory allocation Skylake node
The following example will show how to fully request the RAM available on 1 Skylake node on HPC.
Full node memory allocation AMD EPYC node with 128 CPU cores
Full node memory allocation and request 2 out of 3 available GPU cards
Testing/Debugging code if a job fails
It is recommended (if there is a job failure with potential coding issues) to debug and test the code under the debug partition.
It is also recommended, to test the code with a simple test program to see if a simple program works when submitting to the debug partition.
If a simple test program works, then there might be issues with the current input file or code being executed within the job submission script.
Monitor CPU and Memory
Completed Jobs
Slurm records statistics for every job, including how much memory and CPU was used.
seff
After the job completes, you can run seff <jobid>
to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to.
It is recommended to test various job submissions to utilize memory based on the code being submitted at the time.
RAM is a trackable resource on HPC.
By default, 2G of RAM is assigned per core allocated to the job.
Specifying the amount of RAM in the submission script can still be used to allocate a max RAM value, but it is recommended to submit jobs without the SLURM #SBATCH --mem header to profile jobs accordingly.
Interactive jobs
If you require an interactive job, use the srun
command instead of sbatch
. This command does not use a submission script. Instead, all of the options from the submission script are given on the command line, without the #SBATCH
keyword.
Basic example:
There is an alternative command called
fisbatch
that works similarly tosrun
, but it is a little older and can have bugs running an interactive job.To use a custom partition with interactive jobs, specify the
--partition
parameter:If you suddenly lost the connection from the interactive screen, you can try the following
srun
orfisbatch
commands to link back.Re-attach to an interactive job: First you need to get the JobID of the
srun
orfisbatch
job:Then, you can re-attach the job by JobID using the following
sattach
orfisattach
commands:
Checking the Status of a Job
To view your active jobs:
Alternatively, the
squeue
command may be more descriptive for jobs in aPENDING
state:To view your job history:
To view all the jobs in the cluster:
To view details about nodes and partitions:
To review all the job logs:
How to Release a Job from “JobHeldUser”
If you have a job that has been “held,” it will not be able to run it is “released.” You will know your job is being held if squeue says the job state is SE (Special Exit) and the reason says “JobHeldUser.” We have a deeper explanation for why this happens in our FAQ here. But in brief, you can release a job using its jobID or jobName.
To release a single job
To release all jobs with a given job name
How to Terminate a Job
To terminate a single job:
To terminate all of your jobs:
Priority Jobs
If you have been granted access to priority resources through our condo model then you need to submit your jobs using a custom Partition in order to avoid resource limits.
For example:
Checking the state of nodes and partitions
You can view a listing of nodes and see what resources are currently being used. with the nodeinfo
command. This may be helpful for determining whether there are available nodes with sufficient resources available to backfill your jobs.
The following example will show all idle or mix state nodes available in the general-gpu partition :
Jobs in the “idle” state have all cores available, whereas jobs in the “mix” state generally have some cores available. If you want to target the remaining available cores on a node in the mix state, it can be helpful to figure out how many cores are available. To check that we can use the sinfo -n <node_name>
command to look at a specific node like so, in this case gpu14
:
This output reports CPUs in one of four states: A, allocated; I, idle; O, other (ignore this one); and T, total. From this output we can see that 36 of gpu14’s 64 cores are allocated and 28 are available. Now, the one tricky part about the Storrs HPC is that even though it says 28 are available, there are really only 26 available because the first two cores on every node in the general access partitions are reserved for HPC administration tasks. So if we wanted to target gpu14, we would have to use a command like this which asks for 26 cores (or less).
Job State Codes
Status | Code | Explaination |
---|---|---|
COMPLETED |
| The job has completed successfully. |
COMPLETING |
| The job is finishing but some processes are still active. |
FAILED |
| The job terminated with a non-zero exit code and failed to execute. |
PENDING |
| The job is waiting for resource allocation. It will eventually run. |
PREEMPTED |
| The job was terminated because of preemption by another job. |
RUNNING |
| The job currently is allocated to a node and is running. |
SUSPENDED |
| A running job has been stopped with its cores released to other jobs. |
STOPPED |
| A running job has been stopped with its cores retained. |
SPECIAL EXIT |
| The job failed. Check logs to determine reason and either release or cancel. |
A full list of these Job State codes can be found in Slurm’s documentation.
Job Reason Codes
Reason Code | Explaination |
---|---|
| One or more higher priority jobs is in queue for running. Your job will eventually run. |
| This job is waiting for a dependent job to complete and will run afterwards. |
| The job is waiting for resources to become available and will eventually run. |
| The job’s account is invalid. Cancel the job and rerun with correct account. |
| The job’s QoS is invalid. Cancel the job and rerun with correct account. |
| All CPUs assigned to your job’s specified QoS are in use; job will run eventually. |
| Maximum number of jobs for your job’s QoS have been met; job will run eventually. |
| All nodes assigned to your job’s specified QoS are in use; job will run eventually. |
| All CPUs assigned to your job’s specified partition are in use; job will run eventually. |
| Maximum number of jobs for your job’s partition have been met; job will run eventually. |
| Node number requested exceeds limit; user needs to decrease nodes requested to below limit. |
| All CPUs assigned to your job’s specified association are in use; job will run eventually. |
| Maximum number of jobs for your job’s association have been met; job will run eventually. |
| All nodes assigned to your job’s specified association are in use; job will run eventually. |
A full list of these Job Reason Codes can be found in Slurm’s documentation.
Related articles