Policies relating to the Deepthought High-Performance Computing Clusters

Please note that this page is still under construction. Therefore not all policies related to the Deepthought HPCCs are currently listed here.

Table of Contents

  1. General Policies
  2. Policies on Usage of Login Nodes
  3. Policies on Usage of Disk Space
    1. Policies Regarding Home Space
    2. Policies Regarding Division-Provided Data Volumes
    3. Policies Regarding Research Group Provided Data Volumes
    4. Policies Regarding Locally Attached Scratch Space


General Policies on Usage of High Performance Computing Clusters

The High Performance Computing (HPC) Clusters are part of the information technology resources that the Division of Information Technology makes available to the university community, and as such are covered by the campus Acceptable Use Policy (AUP). All users of the HPC clusters are required to adhere by the campus AUP in addition to the policies specific to the HPC clusters. The AUP applies to all HPC clusters made available by the Division of Information Technology, not just the Deepthought clusters. I.e., the MARCC/Bluecrab cluster is still an university IT resource governed by the AUP even though it is housed off-campus.

You should read and familiarize yourself with the Acceptable Use Policy. The AUP includes the following provisions which might be particularly applicable to users of the HPC clusters, but the list below is NOT complete and you are bound by all of the policies in the AUP.

  • "Those using university IT resources [...] are responsible for [...] safeguarding identification codes and passwords" DO NOT SHARE YOUR PASSWORD with anyone. If you have a student, colleague, etc. who needs access to your HPC allocation, request they get added to your allocation.
  • "Engaging in conduct that interferes with others' use of shared IT resources" is prohibited. The HPC cluster is a shared resource. The specific policies below about use of login nodes and disk space are related to this point.
  • "Using university IT resources for commercial or profit-making purposes" is prohibited without written authorization from the university.

In addition to the AUP, the HPC clusters have there own policies enumerated in this document. Among these are:

  • You are required to promptly comply with all direct requests from HPC systems staff regarding the use of the clusters. This includes requests to reduce disk space consumption or refrain from particular actions. We purposefully are trying to keep the list of rigid policy rules as short as possible in order to facilitate the use of this research tool in novel and creative ways. However, when we encounter behavior or practices which are interfering with the ability of others to use this shared resource, we will step in and require your prompt compliance with such requests.
  • You are required to monitor your USERNAME@umd.edu (or USERNAME@terpmail.umd.edu) email address. You can read it on a campus mail system, or forward it to another address or email system which you do read, but that is the address at which system staff will contact you if we need to, and you are expected to be monitoring it.

Policies on Usage of Login Nodes

The login nodes are provided for people to access the HPC clusters. They are intended for people to setup and submit jobs, access results from jobs, transfer data to/from the cluster, etc. As a courtesy to your colleagues, you should refrain from doing anything compute intensive on these nodes as it will interfere with the ability of others to use the HPC resources. Compute intensive tasks should be submitted as jobs to the compute nodes, as that is what compute nodes are fore.

Short compilations of code is permissible. If you are doing a very parallel or long compilation, you should consider requesting an interactive job and doing your compilation there as a courtesy to your colleagues.

Compute intensive calculations, etc. are NOT allowed on the login nodes. If system staff find such jobs running, we will kill them without prior notification. Users found in violation of this policy will be warned, and continued violation may result in suspension of access to the cluster.

WARNING
Do NOT run compute intensive calculations on the login nodes

Policies on Usage of Disk Space

The Division of Information Technology and the various contributing research groups have provided large amounts of disk space for the support of jobs using the Deepthought HPC Clusters. The following policies discuss the use of this space. In general, the disk space is intended for support of research using the cluster, and as a courtesy to other users of the cluster you should try to delete any files that are no longer needed or being used.

WARNING
All Division of Information Technology provided lustre and /data volumes are for the support of active research using the clusters. The only exceptions are the /data/dt-archive* volumes on the original Deepthought cluster. You must remove your data files, etc. from the cluster promptly when you no longer have jobs on the clusters requiring them. This is to ensure that all users can avail themselves of these resources.
WARNING
The ONLY filesystems backed up by the Division of Information Technology on the HPC clusters are the homespaces. Everything else might be irrecoverably lost if there is a hardware failure. So copy your precious files (e.g. custom codes, summarized data) to your home directory for safety.

For the purposes of HPCC documentation and policies, the disk space available to users of the cluster is categorized as indicated below.

  • home space: This is the directory which you see when into the systems in the clusters. This home directory is distinct from your normal Glue/TerpConnect home directory, and is distinct between the different HPC clusters. It is visible to all nodes within the specific HPC cluster, but is not visible anywhere else, including other HPC clusters. Home space is provided by the Division to all HPCC users, and is backed up to tape nightly. This is intended for relatively small amounts of valuable information: codes, scripts, configuration files, etc. It is not as highly optimized for performance as the /data/... volumes, and so you should avoid doing heavy I/O to your home space in your jobs. Policies related to homespace
  • Division of Information Technology provided data space: This includes lustre (e.g. /export/lustre_1 on Deepthought and /lustre on Deepthought2) and NFS mounted data storage ( /data/dt-*). It is provided by the Division of IT and is visible to all nodes in the cluster. All HPCC users can access it (although if your research group has its own data volumes, we request that you use that preferentially.) Research-owned lustre space is just a reservation of the total lustre space for that research group, so there is no user-visible difference between storage owned by research groups and DIT-owned lustre storage. The NFS data volumes are better optimized for performance than the home space volumes, and the lustre filesystem is still better optimized. But jobs doing heavy I/O should still seriously investigate using local scratch space instead. Neither lustre nor NFS mounted data volumes are backed up to tape, but allows for more storage than home space volumes. Still, remember to store critical data on the home space which is backed up. Policies related to DIT provided data space.
  • Research group provided data space: Some research groups have purchased additional data space for use by their members. This can be separate NFS mounted data volumes, or part of the lustre filesystem. In the former case, these are special data volumes and access is limited to members of the groups contributing to their purchase. Research groups can also buy lustre storage, which is added into the total lustre pool and then an amount equal to the contribution is reserved for your groups use. Policies related to research group provided data space.
  • local scratch space: Each compute node has local scratch space available as /tmp. For the original Deepthought cluster, this varies significantly depending on the node, but is at least 30 GB. For Deepthought2 nodes, this is amount is about 750 GB. This space is available for use by your job while it is running; any files left there are deleted when the job ends. This space is not backed up, and files will be deleted without notice when job ends. This space is only visible to the node it is attached to; each node of a multinode job will see its own copy of /tmp which will differ from /tmp on the other nodes. However, being directly attached, this space will have significantly better performance than NFS mounted volumes. Policies related to local scratch space.
  • DIT-provided longer term storage: Unfortunately, the options for archival storage are rather limited at this time. However,
    • On the original Deepthought cluster, there is about 60 TB of iSCSI storage is available for storing data which although important is not being actively used by jobs. Please see the section on archival storage on the Deepthought cluster for more information.
    • Archival data can also be stored on Google's G Suite drive. This can hold large amounts of data, although transfer times can be less than optimal. See the section on archival storage using G drive for more information.

    These options are available for the storage of files and data not associated with active research on the cluster (such files should not be stored in lustre or the /data volumes). This is useful for data which needs to be kept but rarely accessed, e.g. after a paper is published, etc. While there is no time limit on how long data can stay in these locations, it is still requested (especially on the iSCSI storage on DT1) that you delete items after they are no longer needed. Policies related to longer term storage

WARNING
The /data/dt-archiveN volumes are the ONLY places provided by the Division of Information Technology for the storage of data not being actively used by computations on the cluster.

A list of all data volumes

Policies on Usage of Home Space

  1. Do NOT start jobs from your home directory or subdirectories underneath it. Run the jobs from lustre or from a /data/... volume.
  2. Jobs should not perform significant I/O to/from homespace volumes. Use lustre, a /data/... volume or the locally attached scratch space(/tmp).
  3. Delete or move off the HPCC any files which are no longer needed or used.
  4. There is a 10 GB soft quota on home directories. This soft quota will not prevent you from storing more than 10 GB in your home directory, however, a daily check of disk usage will be performed and if you are above the quota you will receive an email requesting that you reduce disk usage within a grace period of 7 days. The email reminders will continue until usage is reduced or the grace period is over. If you are still overquota at that time, system staff will be notified and more severe emails will be sent, and unless the situation is remedied prompty system staff may be forced to take action, which could involve relocating or delting your files. This soft quota approach is being taken to ensure all HPCC users get fair access to this critical resource without unduly impacting performance on the cluster and allowing you some flexibility if you need to exceed the 10 GB limit for a few days.

Policies on Usage of Division of Information Technology Provided Data Space

  1. Jobs should avoid doing extensive I/O to/from /data/... volumes as NFS performance will degrade affecting both your jobs and other users of the system. Please look into using lustre or the locally attached scratch space(/tmp) if at all possible; contact DCS if you need assistance with that. This is especially a concern if you have a lot of processes (either a lot of small jobs, or big jobs where each task is doing I/O) accessing these volumes heavily.
  2. Delete or move off the HPCC any files which are no longer needed or used. This space is for the support of jobs running on the system; it is not for archival purposes. Files which are not actively being used by computations on the cluster must be removed prompty to ensure these resources are available for other users.
  3. When the filesystems are filling up, systems staff will send out emails to the largest consumers of space on the affected filesystems requesting that you reduce your footprint. You are required to comply with these requests and promptly reduce your disk usage on the specified filesystems.
  4. Systems staff reserve the right to delete files on the lustre and DIT provided data volumes that are more than 6 months old without notice. While we hope to not need to invoke this option often, this is needed and will be done if users are not complying with the previous policy items. So delete files when they are not needed by jobs, and when you receive requests to do so from system staff to avoid having us delete files for you.
  5. Files in lustre or in the /data/... volumes are not backed up.
WARNING
The DIT provided data space, both lustre and /data/... volumes, are NOT for archival storage. They are ONLY for files supporting active research on the clusters. You must remove any data which is no longer needed for jobs you are running on the cluster promptly.
WARNING
Lustre and /data/... volumes are NOT backed up.
WARNING
Files older than 6 months on the lustre and /data/... volumes are subject to deletion by system staff without notice. So do yourself a favor and delete unneeded files yourself, and respond promptly to requests to reduce your disk usage to avoid having systems staff reduce your disk usage for you. Our use of the rm command is likely to be much less discriminating between important and unimportant files than yours would be.

Policies on Usage of Research Group Provided Data Space

  1. Jobs should avoid doing extensive I/O to/from /data/... volumes as NFS performance will degrade affecting both your jobs and other users of the system. Please look into using lustre or the locally attached scratch space(/tmp) if at all possible; contact DCS if you need assistance with that.
  2. You should still delete or move off the HPCC any files which are no longer needed or used, so as not to adversely impact other users of your research group.
  3. Files are not backed up.
WARNING
The research group provided data stores are NOT backed up by DIT.

Policies on Usage of Locally Attached Scratch Space

  1. Please have your jobs use locally attached scratch space (/tmp) wherever it is feasible. This generally offers the best disk I/O performance. Contact DCS if you have questions or need assistance with that.
  2. Files in locally attached scratch space are not backed up.
  3. Files in locally attached scratch space are deleted upon termination of the job.
  4. Although all files in /tmp that belong to you will be deleted when you no longer have any jobs running on the node, it is good practice to delete files yourself at the end of the job where possible. Especially if you run many small jobs that can share a node; as otherwise it can take some time for the automatic deletion to occur and that can reduce the available space in /tmp for other jobs.
WARNING
Any files you own under /tmp on a compute node will be deleted once the last job of yours running on the node terminates (i.e. when you no longer have any jobs running on the node).

DIT-provided longer term storage

  1. The /data/dt-archiveN volumes and Google's G drive are the ONLY DIT-provided storage where it is permissible to store files and data not associated with active research on the cluster. It can be used to archive data e.g. that needs to be kept for a while after a paper is published.
  2. The /data/dt-archiveN volumes are only available from the login nodes of the original Deepthought cluster. They are NOT available from the compute nodes.
  3. Do not use this storage for active jobs.
  4. These volumes are NOT backed up.
  5. Google's G drive storage is NOT on campus, and as such there may be restrictions on what types of data is allowed to be stored there (from a security perspective). Please see the Google drive service catalog entry for more information regarding this.
  6. WARNING
    The /data/dt-archiveN volumes are NOT backed up.