Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Data deletion inside the /scratch folder is based on file modification time.

  • Scratch data is transient and is purged after 60 days. Once data is no longer needed for computation, it should be immediately transferred to /shared or ARCHIVE. Do not use the scratch file system (/scratch) for long-term storage.

  • Certain directories are only mounted on demand by autofs. These directories are: /home and /shared. If you try to use shell commands like ls on these directories they may fail. They are only mounted when an attempt is made to access a file under the directory, or using cd to enter the directory structure.

  • HPC no longer uses /archive folders. Rather, aged data stored on /shared will be moved to a long-term tier on the backend. This process is automatic and invisible to you.

  • If you are in need of more space, you can try creating compressed archives (i.e. “tarballs”) of large folders using a command similar totar -zcvf compressedFileName.tar.gz folderToCompress. You can then search for files within the tarball using tar -tzvf compressedFileName.tar.gz. (See the tar man page with the man tar command).

Long Term Data Storage

Once data is no longer needed for computation, it should be transferred off of /scratch to a permanent data storage location under /shared. Do not use the scratch file system (/scratch) for long-term storage; it is optimized for fast parallel access from multiple computers, and is too scarce and too expensive for long-term storage.

If you need more storage than is provided by your /home directory (or /shared directory for those groups that use them), then your PI can request a /shared/NetID/ARCHIVE folder. This is reliable tiered file system that performs almost as well as the other near-line storage in HPC.Once you have obtained access to an ARCHIVE folderto back up data to an external location, you can transfer data it in two ways. The faster way uses the standard Unix utilities (such as cp, tar, etc) run on the HPC nodes, and is suitable for small transfers. The best way to transfer larger data sets uses the Globus service. Globus performance depends upon system traffic and network performance, however jobs are persistent after disconnecting which makes it more suitable for large transfers.

...

Name

...

Path

...

Default Quota

...

Relative Performance

...

Persistence

...

Backed up?

...

Purpose

...

Archive

...

/shared/NetID/ARCHIVE

...

None

...

Fast/Archival

...

Yes

...

On Request

...

For information on how to best organize your backups, see our page on Backing Up Your Data.

...