...
Data deletion inside the
/scratch
folder is based on file modification time.Scratch data is transient and is purged after 60 days. Once data is no longer needed for computation, it should be immediately transferred to
/shared
orARCHIVE
. Do not use the scratch file system (/scratch) for long-term storage.Certain directories are only mounted on demand by
autofs
. These directories are:/home
and/shared
. If you try to use shell commands likels
on these directories they may fail. They are only mounted when an attempt is made to access a file under the directory, or usingcd
to enter the directory structure.HPC no longer uses
/archive
folders. Rather, aged data stored on/shared
will be moved to a long-term tier on the backend. This process is automatic and invisible to you.If you are in need of more space, you can try creating compressed archives (i.e. “tarballs”) of large folders using a command similar to
tar -zcvf compressedFileName.tar.gz folderToCompress
. You can then search for files within the tarball usingtar -tzvf compressedFileName.tar.gz
. (See the tar man page with theman tar
command).
Long Term Data Storage
Once data is no longer needed for computation, it should be transferred off of /scratch
to a permanent data storage location under /shared
. Do not use the scratch file system (/scratch) for long-term storage; it is optimized for fast parallel access from multiple computers, and is too scarce and too expensive for long-term storage.
If you need more storage than is provided by your /home
directory (or /shared
directory for those groups that use them), then your PI can request a /shared/NetID/ARCHIVE
folder. This is reliable tiered file system that performs almost as well as the other near-line storage in HPC.Once you have obtained access to an ARCHIVE
folderto back up data to an external location, you can transfer data it in two ways. The faster way uses the standard Unix utilities (such as cp, tar, etc) run on the HPC nodes, and is suitable for small transfers. The best way to transfer larger data sets uses the Globus service. Globus performance depends upon system traffic and network performance, however jobs are persistent after disconnecting which makes it more suitable for large transfers.
...
Name
...
Path
...
Default Quota
...
Relative Performance
...
Persistence
...
Backed up?
...
Purpose
...
Archive
...
/shared/NetID/ARCHIVE
...
None
...
Fast/Archival
...
Yes
...
On Request
...
For information on how to best organize your backups, see our page on Backing Up Your Data.
...