Getting help on the HPC clusters

  1. Overview of getting help on HPC clusters
  2. Information to include/tips for submitting tickets
    1. Using the script command to record actions
  3. Opening a help ticket
  4. Discussion list
  5. HPC Boot Camp

Overview of getting help on the HPC clusters

We are constantly trying to improve our documentation on the clusters, and we ask that for basic questions on usage and how to do things that you look at that first. We are developing a FAQ and we have general usage documentation. Kindly read this before asking questions via help tickets.

While the HPC systems staff will try to assist you on just about any question, we are generally not very familiar with the various applications used in your research, and therefore cannot always provide much useful assistance. Often such questions are best directed at your colleagues. The Division of Information Technology is trying to find ways to facilitate such collaboration (suggestions are welcome), but one mechanism currently in place is the hpcc-discuss mailing list.

Of course, not all questions are covered (or covered clearly) in the documentation, and in these cases you should open a help ticket.

And unfortunately, sometimes there are real hardware, software, or other problems with the system. While we are sometimes aware of these issues from our own monitoring, HPCCs are by their nature complicated beasts, and some issues are not easily detected from monitoring. So if you encounter an issue that you believe to be of a system nature, please open a help ticket.

Information to include/tips for submitting tickets

To help us serve you better, when you submit a help ticket, please:

  • Open a new ticket for new issues. Replying to email from the ticketing system will append to the existing ticket, which is appropriate if you are providing further information, etc. about an ongoing issue. But new issues need to have their own, new ticket. DO NOT REPLY to a previous email if you are starting a new ticket. Or at least remove the ISSUE= and PROJ= parts of the subject line.
  • Provide a descriptive subject line, succinctly summarizing the problem. Something like "HPC problems" is NOT very useful to us.
  • Include your login name (i.e. your @umd.edu email address)
  • Please include the name of the cluster you are experiencing a problem with (e.g. Deepthought/DT, Deepthought2/DT2, Evergreen, Bswift, etc).
  • If there are jobs involved in the issue, please provide at least some of the job numbers.
  • Similarly, provide the full path to the script you used to submit the job, and for the stdout/stderr error files from the jobs.
  • If you received an error message when running a command (not already in the stderr/stdout files above) please provide the message. The script command (see below) might be helpful.
  • Please do NOT attach the contents of the files above, or include them in the body of the message. Just provide the paths to the files on the system. If you plan to submit the job again with a slightly modified version of the file, please copy it and give us the name that you copied it to.
  • Please, PLEASE do NOT attach screen shots of text. Please get the output into a file (see the script command below to capture output if needed), and give us the path to that file.
  • If you are reporting what seem to be problems with your jobs being stuck in the pending state in the queue, do NOT delete the jobs unless we tell you to do so. Deleting the jobs makes it harder to diagnose the problems.
  • If you are having connection issues, please include the exact command you are running, the host you are trying to connect to, the username you are using (DO NOT INCLUDE PASSWORDS), the approximate time of the failed attempts (as accurately as you can), and if possible the IP address of the machine you are trying to connect from. The URL http://noc.net.umd.edu/cgi-bin/netmgr/whoami will give you that last piece of information.
  • For new tickets, please provide context and complete information. Do NOT assume that we are aware of matters discussed in other tickets you submitted. Several people respond to the tickets; the person answering your current ticket might not have dealt with your previous tickets. We also have thousands of users; even if the same person dealt with your previous issues, they might not remember it. In general, you do and should NOT mention previous tickets, but if you think it is relevant, either give the ticket number (so we can look it up) or a succinct but complete summary. For follow ups on the same ticket number, we already have a history of the ticket and so you do NOT have to repeat things in every response.

Using the script command to record actions

Sometimes when diagnosing an issue, we will ask you to show us exactly what commands you issued and what they returned. Or, you need to show us a long complicated error message. An useful tool in these cases is the script command; once you issue it, it will start a new shell and log all of your input to and all the output from the new shell. This is not that useful for programs that run in a graphical environment, but provides a fairly good log for command line processes.

For example, in the following, we log the session to the file help.script in my home directory:


login-1:~: script help.script
Script started, file is help.script
login-1:~: date
Tue Oct 21 10:41:07 EDT 2014
login-1:~:  module list
Currently Loaded Modulefiles:
  1) dept/Glue
login-1:~:  ncap2
ncap2: Command not found.
login-1:~: exit
exit
Script done, file is help.script
login-1:~:
login-1:~:
login-1:~: cat help.script
Script started, file is help.script
login-1:~: date
Tue Oct 21 10:41:07 EDT 2014
login-1:~:  module list
Currently Loaded Modulefiles:
  1) dept/Glue
login-1:~:  ncap2
ncap2: Command not found.
login-1:~: exit
exit

Script done on Tue Oct 21 10:42:51 2014
login-1:~:

NOTE: Always remember to exit the shell started by the script command. And, as in the above example, it can be useful to print the contents of the file (e.g. with the cat command) to verify things were properly recorded.

Submitting a help ticket

There are several ways to actually submit the ticket to the UMD Division of Information Technology:

  • To update an existing ticket, you can just reply to a previous email for that ticket.
  • You can open a new ticket by emailing hpcc-help@umd.edu. Please provide a reasonable subject line.
  • You can submit a new ticket online via the web interface

NOTE: The Division of Information Technology at the University of Maryland does NOT maintain the MARCC/Bluecrab HPC cluster. While you are welcome to submit a ticket to the Division of Information Technology for support with issues on the Bluecrab cluster, and we will try to assist you, many matters will require or be more readily solved by contacting the MARCC support staff. Again, please provide a reasonable subject line. If you decide to contact both support staffs on the same issue, kindly:

  • send separate emails to marcc-help@marcc.jhu.edu and hpcc-help@umd.edu or blind carbon copy the help email addresses. Both of these email addresses go into ticketing systems, and the automated replies can create some minor havoc when two distinct ticketing systems are included on the same issue.
  • inform us in the ticket that you have contacted both groups.

Discussion list

Although systems staff will try to assist you with just about any problem on the HPC clusters you request assistance with, our expertise does not extend very far into the various codes that run on the cluster, and certainly not into the science, etc. behind them. Questions of such natures are best directed at your peers and colleagues.

Because some codes are run by users in different groups across campus, the Division of Information Technology is trying to come up with ways to facilitate these collaborative discussions (suggestions are welcome). One mechanism that currently is in place is an open discussion list, hpcc-discuss. This discussion list is an open forum wherein you can ask questions of other members of the list. It is also hoped that you will take the time to assist other, newer members of the Deepthought/Deepthought2 community when they ask questions which you know the answers to.

The discussion is currently unmoderated, but the intent is to provide a place to ask technical questions regarding the use of the HPC in your research. Questions about basic Unix commands are probably not appropriate, and are better directed towards systems staff, as would be issues logging into the systems, etc. But if you have questions regarding the use of specific application software packages, especially questions more tightly coupled with specifics about your research, this is probably a good forum for that.

To join the list, either:

  1. Go to https://listserv.umd.edu, click on Subscriber's Corner, show All Lists, and select and subscribe to hpcc-discuss and hit submit.
  2. Send email to listserv@listserv.umd.edu with the contents subscribe hpcc-discuss anonymous.

After doing the above, you will get a confirmation email which contains instruction on how to complete the subscription (either reply with the line "Ok", or visit the provided URL).

You will then receive email sent to the list, and you can reply to such email or send email to hpcc-discuss@umd.edu to post a new message.

Remember, although the Division of Information Technology is making the list available, the usefulness of the list depends on users of the HPC clusters like you subscribing to the list and contributing to it.

HPC Boot Camp

Since 2010, the Division of Information Technology has offered annual HPC Boot Camps. These multiday courses provide a rapid introduction into general high performance computing and parallel programming topics, and are targetted at graduate students, research staff, and faculty members who want a quick introduction to these concepts. The boot camps are not just lectures, but also have a lab component in which you get to work on one of the DIT's HPC clusters.