Partitions / Storrs HPC Resources

I. Partitions of the Storrs HPC

Storrs HPC is broken up into eight partitions. Each partition refers to a group of nodes that have similar hardware (e.g., GPUs), types of usage (e.g., long jobs vs. short high-throughput jobs), and/or levels of priority required to access them. All users have access to the general, general-gpu, debug, lo-core, and hi-core partitions, but access to other priority nodes can be purchased.

Storrs HPC also has a wide variety of computational architectures available, each with a different strengths. Some have a lot of cores, others lots of RAM, and others are paired with GPUs. Selecting optimal hardware can increase the efficiency of your research, but the definition of “optimal” will be different depending how you’re using the HPC. Below is a table which summarizes the resources available on the HPC.

List of Partitions: 

Name 

Max Wall Time 

Nodes 

Architecture 

Cores

available per node*

Total Cores

 

GPUs available per node

RAM per Node (GB) 

Use 

general 

12 hours 

13

41

148 

Skylake 

Epyc64 

Epyc128 

 

34

62

126

442

2,542

18,648

n/a

n/a

n/a

187

503

503

General-use, free access.

8 node limit per job

general-gpu

12 hours

2

28

Skylake 

Epyc64 

34

62

68

1,736 

1 or 3

1 or 3

187

503

General-use, free access.

2 node limit per job.

preempt 

12 hours 

Epyc128 

 

126

882

n/a

503

QoS*-driven; highest priority 

lo-core 

7 days 

18 

Epyc128 

 

126

2,268

 

n/a

503

Long running serial jobs

4 node limit per job.

hi-core 

6 hours 

28 

Epyc128 

 

126

3,528

 

n/a

503

Highly-parallel jobs 

16 node limit per job

debug 

30 minutes 

1

11

2

 

Skylake 

Epyc64 

Epyc128

Epyc64-A100 gpu

34

62

126

62

34 

62 

126 

 62

n/a

n/a

n/a

1

187

503

503

503

Job submission testing.

2 node limit per job

priority 

Unlimited 

QoS* 

Skylake 

Epyc64 

Epyc128 

 

34

62

126

QoS* 

 

n/a

187

503

503

All condo/priority CPU jobs 

priority-gpu

Unlimited

QoS*

Intel^

Skylake-30^

Skylake-34

Epyc64 

20^

30^

34

62

QoS*

2 or 3^

8^

1 or 3

1, 3, or 4

125^

376^

187

503

All condo/priority GPU jobs

class 

4 hours 

12

1

12

 

 

Skylake 

Skylake-GPU

Epyc128 

 

34

34

126

408 

34

1,512 

n/a

1

n/a

187

187

503

For classroom/instructional 

osg 

2 days

QoS*

Epyc64 

62 (124 threads)

64 

n/a

503

OSG only 

*Available Cores per node – 2 cores are reserved per node for the OS and storage processes. This is not applied to the Haswell and Broadwell architectures.

*QOS – One can specify a Quality of Service (QOS) for each job submitted to Slurm. The QoS associated with a job will affect the group’s maximum cumulative core and gpu count (these are known as Group Trackable RESources, or GrpTRES) and job priority. These limits are determined by the number of cores and gpu’s a PI purchases using the Condo model. 

^GPUs: denotes nodes with lower-quality, consumer-grade GPUs

NOTE – Total cores depends upon node type/cpu architecture. Epyc 128-core nodes represent the top end of range. OSG Epyc nodes are 64 cores. Priority/preempt node assignments overlap with general partition. 

II. Node Features

To make it easier for you to find the right hardware for your research, we have labeled each node with certain features like “gpu” (read: has GPUs) or “a100” (read: has A100 GPUs). You can target those features when starting a job on the HPC by using constraints. For more info, check out our guide to job submission or SLURM cheatsheet. Below is a list of the features used on the Storrs HPC and their corresponding descriptions.

Feature Name

Description

Feature Name

Description

cpuonly

Standard CPU nodes without GPUs; keep gpu nodes free for gpu-intensive job

epyc64

has the AMD EPYC 7452 architecture

epyc128

has the AMD EPYC 7713 architecture

gpu

Nodes with GPUs

a100

nVIDIA Tesla A100 gpu’s

v100

nVIDIA Tesla v100 gpu’s

l40

nVIDIA Tesla L40 gpu’s (single-precision) <priority-gpu>

 Node Descriptions

This describes the general features available on the nodes on the HPC. Please see our SLURM Cheatsheet for more in-depth guidance on targeting different architectures and amounts of RAM.

Node Architecture

Cores Available

Memory/RAM Available

Flags for Requesting All of Node’s Memory (if all cores requested)

Node Architecture

Cores Available

Memory/RAM Available

Flags for Requesting All of Node’s Memory (if all cores requested)

Epyc128

(non-OSG partition)

126

503 GB

--mem-per-cpu=3G for 378G per node

--mem-per-cpu=4090M for 503G per node

--mem-per-cpu=4188680K for 503G per node

Epyc64

62

503 GB

--mem-per-cpu=8G for 496G per node

--mem-per-cpu=8312M for 503G per node

--mem-per-cpu=8512478K for 503G per node

GPU Specifications

GPU Type

Memory per Card

GPU Type

Memory per Card

Consumer-grade nVIDIA GTX1080ti

11.264 GB

Consumer-grade nVIDIA RTX2080ti

11.264 GB

nVIDIA Tesla v100

16.384 GB

nVIDIA Tesla A100

40.960 GB

nVIDIA Tesla L40 (priority-gpu)

46.068 GB

III. Job Submission Examples

Generic example: 

#SBATCH --account=[account] \  #Specify non-default account --partition=[partition] \    #Specify queue type --constraint=["feature""] \     #Specify node feature --qos=[qos_name]             #Specify non-default QoS

 

Preempt example: 

#SBATCH --account=ena02002 --partition=preempt \ --constraint="epyc128" --qos=manoslabpreempt 

 

General submission to specific node types using defaults: 

#SBATCH   --constraint="epyc128"  #general Epyc128 submission #SBATCH   --constraint="a100"     #general submission to a100 gpu nodes

 

Priority GPU submission to an L40 node: