Partitions / Storrs HPC Resources
I. Partitions of the Storrs HPC
Storrs HPC is broken up into eight partitions. Each partition refers to a group of nodes that have similar hardware (e.g., GPUs), types of usage (e.g., long jobs vs. short high-throughput jobs), and/or levels of priority required to access them. All users have access to the general, general-gpu, debug, lo-core, and hi-core partitions, but access to other priority nodes can be purchased.
Storrs HPC also has a wide variety of computational architectures available, each with a different strengths. Some have a lot of cores, others lots of RAM, and others are paired with GPUs. Selecting optimal hardware can increase the efficiency of your research, but the definition of “optimal” will be different depending how you’re using the HPC. Below is a table which summarizes the resources available on the HPC.
List of Partitions:
Name | Max Wall Time | Nodes | Architecture | Cores available per node* | Total Cores
| GPUs available per node | RAM per Node (GB) | Use |
general | 12 hours | 13 41 148 | Skylake Epyc64 Epyc128
| 34 62 126 | 442 2,542 18,648 | n/a n/a n/a | 187 503 503 | General-use, free access. 8 node limit per job |
general-gpu | 12 hours | 2 28 | Skylake Epyc64 | 34 62 | 68 1,736 | 1 or 3 1 or 3 | 187 503 | General-use, free access. 2 node limit per job. |
preempt | 12 hours | 7 | Epyc128
| 126 | 882 | n/a | 503 | QoS*-driven; highest priority |
lo-core | 7 days | 18 | Epyc128
| 126 | 2,268
| n/a | 503 | Long running serial jobs 4 node limit per job. |
hi-core | 6 hours | 28 | Epyc128
| 126 | 3,528
| n/a | 503 | Highly-parallel jobs 16 node limit per job |
debug | 30 minutes | 6 1 11 2
| Skylake Epyc64 Epyc128 Epyc64-A100 gpu | 34 62 126 62 | 34 62 126 62 | n/a n/a n/a 1 | 187 503 503 503 | Job submission testing. 2 node limit per job |
priority | Unlimited | QoS* | Skylake Epyc64 Epyc128
| 34 62 126 | QoS*
| n/a | 187 503 503 | All condo/priority CPU jobs |
priority-gpu | Unlimited | QoS* | Intel^ Skylake-30^ Skylake-34 Epyc64 | 20^ 30^ 34 62 | QoS* | 2 or 3^ 8^ 1 or 3 1, 3, or 4 | 125^ 376^ 187 503 | All condo/priority GPU jobs |
class | 4 hours | 12 1 12
| Skylake Skylake-GPU Epyc128
| 34 34 126 | 408 34 1,512 | n/a 1 n/a | 187 187 503 | For classroom/instructional |
osg | 2 days | QoS* | Epyc64 | 62 (124 threads) | 64 | n/a | 503 | OSG only |
*Available Cores per node – 2 cores are reserved per node for the OS and storage processes. This is not applied to the Haswell and Broadwell architectures.
*QOS – One can specify a Quality of Service (QOS) for each job submitted to Slurm. The QoS associated with a job will affect the group’s maximum cumulative core and gpu count (these are known as Group Trackable RESources, or GrpTRES) and job priority. These limits are determined by the number of cores and gpu’s a PI purchases using the Condo model.
^GPUs: denotes nodes with lower-quality, consumer-grade GPUs
NOTE – Total cores depends upon node type/cpu architecture. Epyc 128-core nodes represent the top end of range. OSG Epyc nodes are 64 cores. Priority/preempt node assignments overlap with general partition.
II. Node Features
To make it easier for you to find the right hardware for your research, we have labeled each node with certain features like “gpu” (read: has GPUs) or “a100” (read: has A100 GPUs). You can target those features when starting a job on the HPC by using constraints. For more info, check out our guide to job submission or SLURM cheatsheet. Below is a list of the features used on the Storrs HPC and their corresponding descriptions.
Feature Name | Description |
---|---|
cpuonly | Standard CPU nodes without GPUs; keep gpu nodes free for gpu-intensive job |
epyc64 | has the AMD EPYC 7452 architecture |
epyc128 | has the AMD EPYC 7713 architecture |
gpu | Nodes with GPUs |
a100 | nVIDIA Tesla A100 gpu’s |
v100 | nVIDIA Tesla v100 gpu’s |
l40 | nVIDIA Tesla L40 gpu’s (single-precision) <priority-gpu> |
Node Descriptions
This describes the general features available on the nodes on the HPC. Please see our SLURM Cheatsheet for more in-depth guidance on targeting different architectures and amounts of RAM.
Node Architecture | Cores Available | Memory/RAM Available | Flags for Requesting All of Node’s Memory (if all cores requested) |
---|---|---|---|
Epyc128 (non-OSG partition) | 126 | 503 GB |
|
Epyc64 | 62 | 503 GB |
|
GPU Specifications
GPU Type | Memory per Card |
---|---|
Consumer-grade nVIDIA GTX1080ti | 11.264 GB |
Consumer-grade nVIDIA RTX2080ti | 11.264 GB |
nVIDIA Tesla v100 | 16.384 GB |
nVIDIA Tesla A100 | 40.960 GB |
nVIDIA Tesla L40 (priority-gpu) | 46.068 GB |
III. Job Submission Examples
Generic example:
#SBATCH --account=[account] \ #Specify non-default account
--partition=[partition] \ #Specify queue type
--constraint=["feature""] \ #Specify node feature
--qos=[qos_name] #Specify non-default QoS
Preempt example:
#SBATCH --account=ena02002 --partition=preempt \
--constraint="epyc128" --qos=manoslabpreempt
General submission to specific node types using defaults:
#SBATCH --constraint="epyc128" #general Epyc128 submission
#SBATCH --constraint="a100" #general submission to a100 gpu nodes
Priority GPU submission to an L40 node: