Files, Storage, and Securing your Data

On the cluster, you have several options available to you regarding where files are stored. This page discusses the various options, and the differences between them in terms of performance, backup, policies, and use for archiving of data.

  1. Your home directory
  2. Data directories
  3. Scratch space
  4. Using lustre
    1. Lustre and striping
  5. Archival storage
  6. Securing your data
  7. Policies regarding usage of Disk Space on the Deepthought HPCCs

Your home directory

Your home directory is private to you, and should be used as little as possible for data storage. In particular, you should NOT run jobs out of your home directory --- run your jobs from the lustre filesystem; this is optimized to provide better read and write performance to improve the speed of your job. After the job is finished, you might wish to copy the more critical results files back to your home directory, which gets backed up nightly. (The /data and lustre filesystems are NOT backed up.)

Do not run jobs out of your home directory, or run jobs doing extensive I/O from your home directory, as it is NOT optimized for that.
Your home directory is the ONLY directory that gets backed up by the Division of IT. You should copy your precious, irreplaceable files (custom codes, summarized results, etc) here.

Home directories are limited by a 10 GB "soft quota" policy. Realizing the need for storage can sometimes vary dramatically over the span of a few days, we have adopted a policy with some flexibility in this regard. There is no hard limit (short of available disk space) on how much you can store in your home directory, but your usage is monitored daily. If it exceeds 10 GB, you will receive email informing you of such and asking you to rectify it. You are given 7 days to bring your usage back under 10 GB. If you are engaged in some activity wherein you need to use more than 10 GB in your home directory for a few days, feel free to do so and this policy allows for that. After 7 days, however, the email will be stronger in tone, and will go to systems staff. At that point, you must bring your home directory usage down below 10 GB ASAP or be in violation of our policy. Failure to comply in a timely manner can lead to the loss of privileges to use the cluster.

Data directories

The Deepthought cluster has a large (over 100 TB) amount of disk space available for the use of active jobs. The Deepthought2 cluster has over 1 PB of disk space available for the use of active jobs. Active means relating to jobs running, or in the queue waiting to run, or ongoing research for which you are regularly submitting jobs. It is NOT meant for archival storage of any type, see if you need archival storage.

The lustre filesystems and DIT provided NFS data volumes are for the storage of files supporting active research on the cluster only. It is NOT for archival storage. Files more than 6 months old on lustre or the data volumes are subject to deletion without notice by systems staff.

There are two types of data storage available on original Deepthought cluster. There is about 6 TB of NFS storage, and over 100 TB (and growing) of lustre storage. All of these are network filesystems accessible from all of the compute nodes. On the Deepthought2 cluster, only home directories are kept on NFS, everything else is in lustre.

Because much of the data generated on the cluster is of a transient nature and because of its size, data stored in the /data and Lustre partitions is not backed up. This data resides on RAID protected filesystems, however there is always a small chance of loss or corruption. If you have critical data that must be saved, be sure to copy it elsewhere.

There are several general purpose areas that are intended for storage of computational data. These areas are accessible to all users of the cluster and as such you should be sure to protect any files or directories you create there. See Securing Your Data for more information.

The data volumes and lustre storage listed below are NOT BACKED UP. Any valuable data should be copied elsewhere (home directory or off cluster) to prevent loss of critical data due to hardware issues. You are responsible for backing up any valuable data.
The areas are:

Path Filesystem Type Approximate Size
The following filesystems are available to all Deepthought2 users
/lustre Lustre 1.1 PB
The following filesystems are available to all Deepthought users
/export/lustre_1 Lustre 137TB

The paths above are for the root of the filesystem. You should use a subdirectory with the name of your username beneath the listed directories. These should already exist for you in lustre, but you might need to create it (with mkdir command) on some of the other data volumes. E.g., if your username is johndoe, your lustre directory on Deepthought2 would be /lustre/johndoe.

The NFS on RAID5 filesystems on the Deepthought HPC cluster are DEPRECATED. The disks and related hardware are old and out of warranty/support, and likely to fail. Data is NOT backed up. Use at your own risk

Please remember that you are sharing these filesystems with other researchers and other groups. If you have data residing there that you don't need, please remove it promptly. If you know you are going to create large files, make sure there is sufficient space available in the filesystem you are using. You can check this yourself with the df command:

login-1:~: df -h /lustre
Filesystem            Size  Used Avail Use% Mounted on
                      1.1P  928T  120T  89% /lustre

This output shows that there are currently 120 TB of free space available on /lustre.

To see how much space is currently being used by a particular directory, use the du command:

login-1:~: du -sh /lustre/bob
1.5T    bob

This output shows that the directory /lustre/bob is currently using 1.5 TB of space.

There are no pre-set limits on how much data you can store on lustre. However, to ensure there is adequate space for everyone using the cluster, this space is only to be used for storing files in active use for jobs and projects currently running on the system. I.e., when your job is finished, remove all the data files, etc. that are no longer needed. The /data/dt-archive* volumes are available if you need to retain the data for longer periods of time, although we ask that you remember to remove data no longer needed from there as well.

Although there are no hard and fast limits on disk usage on the lustre filesystems, when the disks fill up, we will send emails to the people consuming the most space on the disks in question requesting that they clean up, removing any unneeded files and moving files off the disks as appropriate. Timely compliance with such requests is required to ensure the cluster remains usable for everyone; failure to do so is in violation of cluster policies and can result in loss of privileges to use the cluster.

These emails will be sent to your email address; you are required to receive and respond to emails sent to that address. If you prefer using a different email address, be sure to have your address forward to that address. Contact the Division of IT helpdesk if you need assistance with that.

If you have a Glue account and you want to share your data back and forth with that account, you can access it at /glue_homes/<username>. Note that you cannot have jobs read or write directly from your Glue directory, you'll need to copy data back and forth by hand as needed.

Scratch space

It is not uncommon for jobs to require a fair amount of temporary storage. All of the nodes on the original Deepthought cluster have between 1 GB and 250 GB of local scratch space available, with most nodes having at least 30 GB. For Deepthought2, all nodes should have over 750 GB of scratch space available. This space is mounted as /tmp, is is accessible by all processes running on that node. It is NOT available by processes running on different nodes.

Scratch space is temporary. It will be deleted once your jobs complete --- if there is anything you need to save, you must copy it out of scratch space to a data directory, etc. before the job completes. It is NOT backed up.

Because scratch space is local to the system, it is usually quite fast. Lustre storage in theory can be faster, but because that is shared by many users and jobs, scratch space is usually faster than lustre in practice, and typically has rather consistent performance (it can be affected by other jobs running on the system, though these should only be your jobs).

See for information on how to specify the amount of scratch space needed by your job.

Using lustre

Lustre is a high performance distributed file system designed for HPC clusters. Files are distributed among multiple servers, even in some cases different parts of the same file are on different servers. By spreading the load across multiple file servers, this allows for the faster responses to file requests required to deal with the heavy load some parallel codes demand.

Every user username is given a lustre directory, located at /lustre/username. (On the original Deepthought cluster, this used to be /export/lustre_1/username; both Deepthought clusters now have lustre mounted at /lustre, but on the original cluster a symlink has been created to allow the old paths to still work.) Your lustre directory is visible from the login nodes AND from all of the compute nodes. Note that the lustre filesystems on the two Deepthought clusters are distinct, and files on one are not available on the other unless you manually copy them.

For the most part, you can use lustre as you would any other filesystem; the standard unix commands work, and you should just notice better performance in IO heavy codes.

Normally, lustre will keep the data for an individual file on the same fileserver, but will distribute your files across the available servers. The lfs getstripe and lfs setstripe commands can be used to control the striping. More information can be found in the section on Lustre and striping.

Lustre stores the "metadata" about a file (its name, path, etc) separately from the data. Normally, the IO intensive applications contact the metadata server (MDS) once when opening the file, and then contact the object storage servers (OSSes) as they do the heavy IO. This generally improves performance for these IO heavy applications.

Certain common interactive tasks, e.g. ls -l require data from both the MDS and the OSSes, and take a bit longer on lustre. Again, these are not the operations lustre is optimized for, as they are not commonly done frequently in IO heavy codes.

The lfs find command is a version of find optimized for lustre. It tries to avoid system calls that require information from the OSSes in addition to the MDS, and so generally will run much faster than the unoptimized find command. Usage is by design similar to the standard find command.

If you want to see how much space you are currently using in any of the Lustre filesystems, run the command lustre_usage. This will show you total usage for yourself and for any groups you belong to. Note that this will only show you Lustre usage, and will not include any files outside of Lustre.

login-1:~: lustre_usage
Usage for /export/lustre_1:

Username     Space Used   Num Files   Avg Filesize
rubble             2.3T     4134684    607.7K

Group        Space Used   Num Files   Avg Filesize
flint              4.6T     6181607    795.4K

Lustre and striping

As mentioned previously, lustre gets its speed by "striping" files over multiple Object Storage Targets (OSTs); basically multiple fileserver nodes each of which holds a part of the file. This is mostly transparent to the user, so you would not normally know if/that your file is split over multiple OSTs.

By default on the Deepthought clusters, every file is kept on a single OST, and this striping just means that different files are more or less randomly spread across different file servers/OSTs. This is fine for files of moderate size, but might need adjustment if dealing with files of size 10 or 100 GB or more. The lfs getstripe and lfs setstripe commands exist for this.

The getstripe subcommand is the simplest, and just gives information about the striping of a file or directory. Usage is just lfs getstripe FILEPATH and it prints out information about the named file's striping. E.g.:

login-1> lfs getstripe test.tar
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  9
        obdidx           objid           objid           group
             9         2549120       0x26e580                0

The above example shows a file created using default settings. The file in this case is on a single OST (the number of stripes for the file, given by lmm_stripe_count, is 1). The lmm_stripe_offset gives the index to the starting OST, in this case 9, and below that show alls the stripes (in this case, just the single one). One case use the command lfs osts to correlate the index to the name of an actual OST. The lmm_stripe_size value is the size of the stripe, in bytes, in this case 1048576 bytes or 1 MiB.

While examining a file's striping parameters is nice, it is not particularly useful unless one can also change it, which can be done with the lfs setstripe subcommand. Actually, the striping for a file is NOT MUTABLE, and is set in stone at the time of file creation. So one needs to use the setstripe subcommand before the file is created. E.g., to create our test.tar file again, this time striped over 20 OSTs and using a stripe size of 10 MiB, we could do something like:

login-1> rm test.tar
login-1> lfs setstripe -c 20 -S 10m test.tar
login-1> ls -l test.tar
-rw-r--r-- 1 payerle glue-staff 0 Sep 18 17:02 test.tar
login-1> tar -cf test.tar ./test
login-1> ls -l test.tar
-rw-r--r-- 1 payerle glue-staff 8147281920 Sep 18 17:04 test.tar
login-1> lfs getstripe test.tar
lmm_stripe_count:   20
lmm_stripe_size:    10485760
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  55
        obdidx           objid           objid           group
            55       419995932     0x1908a11c                0
            63       468577296     0x1bedec10                0
            45       419403761     0x18ff97f1                0
            68       435440970     0x19f44d4a                0
            57       409176967     0x18638b87                0
            44       377767950     0x1684480e                0
            61       419414421     0x18ffc195                0
            65       356701609     0x1542d5a9                0
            31       408705898     0x185c5b6a                0
            12       429746020     0x199d6764                0
            50       379985276     0x16a61d7c                0
            16       372211487     0x162f7f1f                0
            46       468289628     0x1be9885c                0
            10       402610097     0x17ff57b1                0
            30       425031271     0x19557667                0
            60       423186185     0x19394f09                0
            69       496205056     0x1d937d00                0
            35       409685517     0x186b4e0d                0
            70       415859549     0x18c9835d                0
            15       449399811     0x1ac94c03                0

We start by deleting the previously created test.tar; this is necessary because one cannot use lfs setstripe on an existing file. We then use the -c option to setstripe to set the stripe count, and the -S option to set the stripe size, in this case 10 MiB. One can also use the suffices 'k' for kiB, or 'g' for GiB. The setstripe creates an empty file with the desired striping parameters. We then issue the tar command to put content in the file, and then run the getstripe subcommand to confirm the file has the correct striping.

As mentioned before, one cannot use the setstripe subcommand on an existing file. So what if we want to change the striping of an existing file? E.g., what if we decide now we want test.tar to have 5 stripes of size 1 GiB? Because we cannot directly change the striping of an existing file, we need to use setstripe to create a new file with the desired striping, and copy the old file to the new file (you can then delete the old file and rename the new file to the old name if desired). E.g.

login-1> lfs getstripe test.tar
lmm_stripe_count:   20
lmm_stripe_size:    10485760
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  55
        obdidx           objid           objid           group
            55       419995932     0x1908a11c                0
            63       468577296     0x1bedec10                0
login-1>  ls -l test2.tar
ls: cannot access test2.tar: No such file or directory
login-1> lfs setstripe -c 5 -S 1g test2.tar
login-1> ls -l test2.tar
-rw-r--r-- 1 payerle glue-staff 0 Sep 18 17:16 test2.tar
login-1> cp test.tar test2.tar
login-1> ls -l test2.tar
-rw-r--r-- 1 payerle glue-staff 8147281920 Sep 18 17:17 test2.tar
login-1> diff test.tar test2.tar; echo >/dev/null Make sure they are the same
login-1> lfs getstripe test2.tar; echo >/dev/null Verify striping
lmm_stripe_count:   5
lmm_stripe_size:    1073741824
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  61
        obdidx           objid           objid           group
            61       419416513     0x18ffc9c1                0
            31       408708503     0x185c6597                0
            66       422684037     0x1931a585                0
            49       429032715     0x1992850b                0
            16       372213361     0x162f8671                0
login-1> rm test.tar; mv test2.tar test.tar

This only touches the surface of what can be done with striping in lustre, for additional information look at:

Archival storage

Archival storage on original Deepthought cluster

The Division of IT has some 60 TB of storage attached to the original Deepthought cluster which is available for more archival use. This storage is connected via iSCSI from off-campus, and is rather slow, so it should NOT be used by running jobs. It also is NOT backed up.

However, you are encouraged to move data from the production data volumes and lustre that is no longer actively being used but still needs to be kept around for a while to this archival storage. We do ask, however, out of respect for your colleagues on the system, that you do purge this area of any data that no longer needs to be kept here.

Path Filesystem Type Approximate Size
The following ARCHIVE filesystems are available to all Deepthought users
/data/dt-archive0 NFS (low bandwidth) 8TB
/data/dt-archive1 NFS (low bandwidth) 16TB
/data/dt-archive2 NFS (low bandwidth) 16TB
/data/dt-archive3 NFS (low bandwidth) 16TB
/data/dt-archive4 NFS (low bandwidth) 4TB

Using Google G Suite Drive for Archival Storage

In addition, campus also provides the ability to store large amounts of data on Google's G Suite drive. Please see the Google drive service catalog entry for more information, including restrictions on what data can be stored there and how to set up your account if you have not done so already.

The gdrive utility (use module load gdrive to get it added to your path. To use it, you must first create a token to allow the gdrive utility to access your Google G Suite drive. To do so, run gdrive about (for your default Google drive account, with token stored in ~/.gdrive. Or use the --config flag and/or set GDRIVE_CONFIG_DIR to store token elsewhere). The command will prompt with an URL and ask for a verification code. Follow the URL in a browser, authenticate to Google, and type the returned verification code at the prompt. This will place a token in your gdrive configutation directory (~/.gdrive unless you used the --config flag or set the GDRIVE_CONFIG_DIR environmental variable to point elsewhere). BE SURE TO PROTECT YOUR gdrive configuration directory --- anyone with read access to that directory can access your Google drive as you.

Once you have setup the token above, the command will work normally. gdrive help will provide some usage instructions, but basically gdrive download FILEID to download files from G Suite drive and gdrive upload PATH to upload files to G drive.

Securing Your Data

Your home directory as configured is private and only you have access to it. Any directories you create outside your home directory are your responsibility to secure appropriately. If you are unsure of how to do so, please submit a help ticket requesting assistance.

If you're a member of a group, you'll want to make sure that you give your group access to these directories, and you may want to consider setting your umask so that any files you create automatically have group read and write access. To do so, add the line umask 002 to your .cshrc.mine file.