The HPC team is updating this page. Check back for new information.
Table of Contents | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Quarterly Planned Downtime
Storrs HPC conducts cluster maintenance on a regular schedule throughout the calendar year. During the days listed below, the cluster will be scheduled for downtime to conduct these tasks. Downtime begins at 7AM and is planned to last 8 hours, although the duration may vary.
...
· Firmware updates
· Kernel updates
· Network maintenance
· Major application reconfiguration
...
Regularly scheduled maintenance days are listed below:
Scheduled Maintenance |
---|
Days |
---|
Third |
Thursday of February |
Third |
Thursday of May |
Third |
Thursday of August |
Third |
Thursday of November |
Weekly Planned Maintenance
Weekly maintenance does not require downtime. Jobs typically continue to run and the scheduler remains available, thus no advance notice is provided, except under special circumstances. Tasks include, but are not limited to:
...
The following tasks are performed on a regular basis:
Login nodes reboot and update every Wednesday at 6:00 am nodes update every Tuesday at 10:00am EDT/EST.
Head nodes update every Tuesday at 10:00am EDT/EST.
OSG nodes
reboot and update every other Thursday at 10:30 am EDT/EST.25% 50% of the compute and GPU nodes are scheduled for reboots and updates update every other Tuesday and Thursday and reboot as jobs finish. (Effectively every node is scheduled to reboot every 2 weeks.)at 10:00am EDT/EST.
Example calendar:
1/4 cluster scheduled reboots
Login Nodes reboot
1/4 cluster scheduled reboots
Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday |
---|
1/4 cluster scheduled reboots
Login Nodes reboot
1/4 cluster scheduled reboots; OSG nodes reboot
Login, head, even numbered compute and gpu nodes update | odd numbered compute and gpu nodes update | |||||
Repeat forever |
Unplanned Maintenance
Occasionally, Storrs HPC is forced to conduct unscheduled maintenance that may result in, or result from, a temporary outage. Such instances may include, but not be limited to:
Critical security updates
Hardware failures
Infrastructure-related outages
Extreme weather
When such instances occur, Storrs HPC will notify its user base as soon as possible via the STORRS-HPC_L Listserv email, and will provide status updates throughout the outage.
Please forward any questions to hpc@uconn.edu