SLURM Guide

What is SLURM?

SLURM, or Simple Linux Utility for Resource Management, is a HPC Job Scheduler that helps manage and allocates compute resources to make sure access is distributed fairly between users. For more info, see our glossary.

Overview

To optimally and fairly use the cluster, all application programs must be run using the job scheduler, SLURM.

When you use SLURM's sbatch command, your application program gets submitted as a "job". To better understand how applications get submitted as jobs, let's review the difference between login nodes and compute nodes.

Login nodes: When you connect to the cluster and see [<YourNetID>@login4 ~], you are connected to a single shared computer with all your fellow users, known as the "login node". The purpose of the "login" node is for you to submit jobs, copy data, edit programs, etc. The programs that are allowed to run on login nodes is listed in our usage policy.

Compute nodes: These computers do the heavy lifting of running your programs. However, you do not directly interact with compute nodes. You ask the scheduler for compute nodes to run your application program using SLURM, and then SLURM will find available compute nodes and run your application program on them.

Please do not run computationally-intensive programs on the login nodes. Doing so may slow down performance for other users, and your commands will be automatically throttled or terminated.

Job Submission

  1. First, log in to the cluster:

    $ ssh NetID@hpc2.storrs.hpc.uconn.edu
  2. Use nano or your favorite text editor to create your job submission script. Here is a very simple job example:

    [NetID@login4 ~]$ nano myJob.sh #!/bin/bash #SBATCH --ntasks=1 # Job only requires 1 CPU core #SBATCH --time=5 # Job should run for no more than 5 minutes echo "Hello, World" # The actual command to run
  3. Save your submission script and then submit your job sbatch:

    [NetID@login4 ~]$ sbatch myJob.sh Submitted batch job 279934
  4. You can view the status of your job with the squeue --mecommand (described later in this guide:

The output of your job will be in the current working directory in a file named slurm-JobID.out, where JobID is the number returned by sbatch in the example above.

Job Examples

The HPC cluster is segmented into groups of identical resources called Partitions. All jobs submitted to the cluster run within one of these Partitions. If you do not select a Partition explicitly the scheduler will put your job into the default Partition, which is called general. Each Partition has defined limits for job runtime and core usage, with specific details available on the usage policy page. You can view a list of all partitions and their status by running the sinfo command.

There is also a knowledge base article referencing the available Partitions and the options that can be used to submit to the partitions located here.

Below are multiple examples of how to submit a job in different scenarios.

 

Default (general) partition

The example job below requests 48 CPU cores for two hours, and emails the specified address upon completion:

 

Test (debug) partition

This Partition allows you to request a single node, and run for up to 30 minutes.

 

MPI-optimized (hi-core) partition

This Partition allows you to request up to 384 cores, and run for up to 6 hours.

 

Single node (lo-core) partition

The lo-core partition allow you to request up to 7 days.


Run a single multi-threaded job per node

In some cases, one is running a single multi-threaded program that needs to be spawned on a given node. We need the program to have access to all available CPU cores on that node. To ensure all the CPUs of a given node are allocated to the program on that node, we can use the --ntasks=1 flag combined with the --cpus-per-task= flag. The value --cpus-per-task= flag is set to should be equal to the number of cores available on a given node, but that will vary from node to node. We have included a couple of examples below.

 

AMD-Epyc 128-Core Nodes

 

AMD-Epyc 64-Core Nodes

 

General GPU Nodes

 

Please replace the bash my_script.sh line with your multi-threaded program command, and please see our Storrs HPC Resources page for info on the number of cores available on nodes of different architectures.

For a more extensive list of flags that can be used with the #SBATCH header or the fisbatch/srun command, see this table from our SLURM Cheatsheet.


How to test jobs using the Debug Partition

The debug partition is a great way to troubleshoot and test code before running on a node quickly without running into long wait times.

The above Debug Partition test example shows how to submit to the debug partition, request 4 cores and 1 node to the debug job.

The following example will go into further detail for the Debug Partition and show a different way that can be used to debug code to determine if there is a potential issue or to confirm that the code can run on the HPC hardware at the time of submission.

Different hardware is available to test within the Debug partition, which allows for users to troubleshoot their code on specific architectures.

Skylake node debug test

Test (debug) partition for Skylake nodes: This example will allow you to request a single node, and run for up to 30 minutes on a Skylake node using the max of 36 CPU cores on a Skylake node and run for 5 minutes.

 

AMD EPYC node debug test

Test (debug) partition for AMD EPYC nodes: This example will allow you to request a single node, and run for up to 30 minutes on an AMD EPYC node using the max of 128 CPU cores on an AMD EPYC node and run for 5 minutes.

RAM Job Submission allocation

There are multiple options that can be used to allocate memory to a SLURM job submission script.

Out Of memory issues are fairly common now, if memory is not specified within a job submission script.

To avoid potential Out of Memory errors, there are a couple of ways to designate a RAM assignment within the job script.

There will be a couple of examples below showcasing different ways to allocate memory to a Slurm job that submits to the general partition.

 

Full node memory allocation Skylake node

The following example will show how to fully request the RAM available on 1 Skylake node on HPC.

 

Full node memory allocation AMD EPYC node with 128 CPU cores

 

Full node memory allocation and request 2 out of 3 available GPU cards

Testing/Debugging code if a job fails

It is recommended (if there is a job failure with potential coding issues) to debug and test the code under the debug partition.

It is also recommended, to test the code with a simple test program to see if a simple program works when submitting to the debug partition.

If a simple test program works, then there might be issues with the current input file or code being executed within the job submission script.

Monitor CPU and Memory

Completed Jobs

Slurm records statistics for every job, including how much memory and CPU was used.

seff

After the job completes, you can run seff <jobid> to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to.

It is recommended to test various job submissions to utilize memory based on the code being submitted at the time.

RAM is a trackable resource on HPC.

By default, 2G of RAM is assigned per core allocated to the job.

Specifying the amount of RAM in the submission script can still be used to allocate a max RAM value, but it is recommended to submit jobs without the SLURM #SBATCH --mem header to profile jobs accordingly.

Interactive jobs

If you require an interactive job, use the srun command instead of sbatch. This command does not use a submission script. Instead, all of the options from the submission script are given on the command line, without the #SBATCH keyword.

  1. Basic example:

     

  2. There is an alternative command called fisbatch that works similarly to srun, but it is a little older and can have bugs running an interactive job.

  3. To use a custom partition with interactive jobs, specify the --partition parameter:

  4. If you suddenly lost the connection from the interactive screen, you can try the following srun or fisbatchcommands to link back.

  5. Re-attach to an interactive job: First you need to get the JobID of the srun or fisbatchjob:

    Then, you can re-attach the job by JobID using the following sattach or fisattach commands:

     

Checking the Status of a Job

To view your active jobs:

  1. Alternatively, the squeue command may be more descriptive for jobs in a PENDING state:

  2. To view your job history:

  3. To view all the jobs in the cluster:

  4. To view details about nodes and partitions:

  5. To review all the job logs:

How to Release a Job from “JobHeldUser”

If you have a job that has been “held,” it will not be able to run it is “released.” You will know your job is being held if squeue says the job state is SE (Special Exit) and the reason says “JobHeldUser.” We have a deeper explanation for why this happens in our FAQ here. But in brief, you can release a job using its jobID or jobName.

  1. To release a single job

  2. To release all jobs with a given job name

How to Terminate a Job

  1. To terminate a single job:

  2. To terminate all of your jobs:

Priority Jobs

If you have been granted access to priority resources through our condo model then you need to submit your jobs using a custom Partition in order to avoid resource limits.

  1. For example:

Checking the state of nodes and partitions

You can view a listing of nodes and see what resources are currently being used. with the nodeinfocommand. This may be helpful for determining whether there are available nodes with sufficient resources available to backfill your jobs.

The following example will show all idle or mix state nodes available in the general-gpu partition :

Jobs in the “idle” state have all cores available, whereas jobs in the “mix” state generally have some cores available. If you want to target the remaining available cores on a node in the mix state, it can be helpful to figure out how many cores are available. To check that we can use the sinfo -n <node_name> command to look at a specific node like so, in this case gpu14:

This output reports CPUs in one of four states: A, allocated; I, idle; O, other (ignore this one); and T, total. From this output we can see that 36 of gpu14’s 64 cores are allocated and 28 are available. Now, the one tricky part about the Storrs HPC is that even though it says 28 are available, there are really only 26 available because the first two cores on every node in the general access partitions are reserved for HPC administration tasks. So if we wanted to target gpu14, we would have to use a command like this which asks for 26 cores (or less).

Job State Codes

Status

Code

Explaination

Status

Code

Explaination

COMPLETED

CD

The job has completed successfully.

COMPLETING

CG

The job is finishing but some processes are still active.

FAILED

F

The job terminated with a non-zero exit code and failed to execute.

PENDING

PD

The job is waiting for resource allocation. It will eventually run.

PREEMPTED

PR

The job was terminated because of preemption by another job.

RUNNING

R

The job currently is allocated to a node and is running.

SUSPENDED

S

A running job has been stopped with its cores released to other jobs.

STOPPED

ST

A running job has been stopped with its cores retained.

SPECIAL EXIT

SE

The job failed. Check logs to determine reason and either release or cancel.

A full list of these Job State codes can be found in Slurm’s documentation.

Job Reason Codes

Reason Code

Explaination

Reason Code

Explaination

Priority

One or more higher priority jobs is in queue for running. Your job will eventually run.

Dependency

This job is waiting for a dependent job to complete and will run afterwards.

Resources

The job is waiting for resources to become available and will eventually run.

InvalidAccount

The job’s account is invalid. Cancel the job and rerun with correct account.

InvaldQoS

The job’s QoS is invalid. Cancel the job and rerun with correct account.

QOSGrpCpuLimit

All CPUs assigned to your job’s specified QoS are in use; job will run eventually.

QOSGrpMaxJobsLimit

Maximum number of jobs for your job’s QoS have been met; job will run eventually.

QOSGrpNodeLimit

All nodes assigned to your job’s specified QoS are in use; job will run eventually.

PartitionCpuLimit

All CPUs assigned to your job’s specified partition are in use; job will run eventually.

PartitionMaxJobsLimit

Maximum number of jobs for your job’s partition have been met; job will run eventually.

PartitionNodeLimit

Node number requested exceeds limit; user needs to decrease nodes requested to below limit.

AssociationCpuLimit

All CPUs assigned to your job’s specified association are in use; job will run eventually.

AssociationMaxJobsLimit

Maximum number of jobs for your job’s association have been met; job will run eventually.

AssociationNodeLimit

All nodes assigned to your job’s specified association are in use; job will run eventually.

A full list of these Job Reason Codes can be found in Slurm’s documentation.

 Related articles