Partitions / Storrs HPC Resources

I. Partitions of the Storrs HPC

Storrs HPC is broken up into eight partitions. Each partition refers to a group of nodes that have similar hardware (e.g., GPUs), types of usage (e.g., long jobs vs. short high-throughput jobs), and/or levels of priority required to access them. All users have access to the general, general-gpu, debug, lo-core, and hi-core partitions, but access to other priority nodes can be purchased.

Storrs HPC also has a wide variety of computational architectures available, each with a different strengths. Some have a lot of cores, others lots of RAM, and others are paired with GPUs. Selecting optimal hardware can increase the efficiency of your research, but the definition of “optimal” will be different depending how you’re using the HPC. Below is a table which summarizes the resources available on the HPC.

List of Partitions:

Name	Max Wall Time	Nodes	Architecture	Cores available per node*	Total Cores	GPUs available per node	RAM per Node (GB)	Use
general	12 hours	13 41 148	Skylake Epyc64 Epyc128	34 62 126	442 2,542 18,648	n/a n/a n/a	187 503 503	General-use, free access. 8 node limit per job
general-gpu	12 hours	2 28	Skylake Epyc64	34 62	68 1,736	1 or 3 1 or 3	187 503	General-use, free access. 2 node limit per job.
preempt	12 hours	7	Epyc128	126	882	n/a	503	QoS*-driven; highest priority
lo-core	7 days	18	Epyc128	126	2,268	n/a	503	Long running serial jobs 4 node limit per job.
hi-core	6 hours	28	Epyc128	126	3,528	n/a	503	Highly-parallel jobs 16 node limit per job
debug	30 minutes	6 1 11 2	Skylake Epyc64 Epyc128 Epyc64-A100 gpu	34 62 126 62	34 62 126 62	n/a n/a n/a 1	187 503 503 503	Job submission testing. 2 node limit per job
priority	Default time limit set to 24 hours Unlimitedwhen time is specified in job submission	QoS*	Skylake Epyc64 Epyc128	34 62 126	QoS*	n/a	187 503 503	All condo/priority CPU jobs
priority-gpu	Default time limit set to 24 hours Unlimitedwhen time is specified in job submission	QoS*	Intel^ Skylake-30^ Skylake-34 Epyc64	20^ 30^ 34 62	QoS*	2 or 3^ 8^ 1 or 3 1, 3, or 4	125^ 376^ 187 503	All condo/priority GPU jobs
class	4 hours	12 1 12	Skylake Skylake-GPU Epyc128	34 34 126	408 34 1,512	n/a 1 n/a	187 187 503	For classroom/instructional
osg	2 days	QoS*	Epyc64	62 (124 threads)	64	n/a	503	OSG only

*Available Cores per node – 2 cores are reserved per node for the OS and storage processes. This is not applied to the Haswell and Broadwell architectures.

*QOS – One can specify a Quality of Service (QOS) for each job submitted to Slurm. The QoS associated with a job will affect the group’s maximum cumulative core and gpu count (these are known as Group Trackable RESources, or GrpTRES) and job priority. These limits are determined by the number of cores and gpu’s a PI purchases using the Condo model.

^GPUs: denotes nodes with lower-quality, consumer-grade GPUs

NOTE – Total cores depends upon node type/cpu architecture. Epyc 128-core nodes represent the top end of range. OSG Epyc nodes are 64 cores. Priority/preempt node assignments overlap with general partition.

II. Node Features

To make it easier for you to find the right hardware for your research, we have labeled each node with certain features like “gpu” (read: has GPUs) or “a100” (read: has A100 GPUs). You can target those features when starting a job on the HPC by using constraints. For more info, check out our guide to job submission or SLURM cheatsheet. Below is a list of the features used on the Storrs HPC and their corresponding descriptions.

Feature Name	Description

Feature Name	Description
cpuonly	Standard CPU nodes without GPUs; keep gpu nodes free for gpu-intensive job
epyc64	has the AMD EPYC 7452 architecture
epyc128	has the AMD EPYC 7713 architecture
gpu	Nodes with GPUs
a100	nVIDIA Tesla A100 gpu’s
v100	nVIDIA Tesla v100 gpu’s
l40	nVIDIA Tesla L40 gpu’s (single-precision) <priority-gpu>

Node Descriptions

This describes the general features available on the nodes on the HPC. Please see our SLURM Cheatsheet for more in-depth guidance on targeting different architectures and amounts of RAM.

Node Architecture	Cores Available	Memory/RAM Available	Flags for Requesting All of Node’s Memory (if all cores requested)

Node Architecture

Cores Available

Memory/RAM Available

Flags for Requesting All of Node’s Memory (if all cores requested)

Epyc128

(non-OSG partition)

126

503 GB

--mem-per-cpu=3G for 378G per node

--mem-per-cpu=4090M for 503G per node

--mem-per-cpu=4188680K for 503G per node

Epyc64

62

503 GB

--mem-per-cpu=8G for 496G per node

--mem-per-cpu=8312M for 503G per node

--mem-per-cpu=8512478K for 503G per node

GPU Specifications

GPU Type	Memory per Card

GPU Type	Memory per Card
Consumer-grade nVIDIA GTX1080ti	11.264 GB
Consumer-grade nVIDIA RTX2080ti	11.264 GB
nVIDIA Tesla v100	16.384 GB
nVIDIA Tesla A100	40.960 GB
nVIDIA Tesla L40 (priority-gpu)	46.068 GB

III. Job Submission Examples

Generic example:

#SBATCH --account=[account] \  #Specify non-default account
  --partition=[partition] \    #Specify queue type
  --constraint=["feature""] \     #Specify node feature
  --qos=[qos_name]             #Specify non-default QoS

Preempt example:

#SBATCH --account=ena02002 --partition=preempt \
  --constraint="epyc128" --qos=manoslabpreempt

General submission to specific node types using defaults:

#SBATCH   --constraint="epyc128"  #general Epyc128 submission
#SBATCH   --constraint="a100"     #general submission to a100 gpu nodes

Priority GPU submission to an L40 node: