...
Expand | ||||||||
---|---|---|---|---|---|---|---|---|
Short answer: Your job is being “held.” To release the job and re-submit it to the job queue you can use the Long Answer: Your job failed. We have a separate FAQ on figuring out why a job failed here, but here we will focus on why your job is being held. When jobs fail, they used to be automatically re-queued. This was a problem for a number of users because re-running the job would overwrite their previous data. In January 2024, we re-configured SLURM to prevent this problem. Now, when jobs fail, they are not immediately re-queued. Instead, the jobs will be “held” from the queue until the submitting user “releases” those jobs back into the queue. This change prevents jobs from requeueing automatically and allows users to make a conscious choice to re-queue their jobs. You can re-queue jobs using the below commands:
If you release your jobs into the queue and they keep ending up back in the “held” state, that is an indication that there may be something failing within your submission script in which case you should cancel your jobs and start troubleshooting. Please note that jobs which are left in the queue with the “SE” state will be cancelled after seven days. Please feel free to contact us at hpc@uconn.edu with any questions or concerns. |
...