Allocations and Job Accounting

  1. The Basics
  2. Choosing the Account to Use
  3. The Replenishing Process
  4. Monitoring Usage
    1. How many SUs are left?
    2. General information about an allocation
    3. Seeing job history
    4. Monitoring for excessive usage
    5. Usage reports

The Basics of Allocations and Job Accounting

As an user of the cluster you have access to at least one allocation account, the one belonging to the project which requested your access to the HPC cluster. Some projects have normal and high-priority allocations, and some users are in/have access to allocations from more than one project. You can see which allocations you have access to with the sbalance command.

All jobs that are submitted are associated with an account; this can be specified with the -A flag when the job is submitted, or will use the submitter's default account. For more information on specifying the account, or changing your default account. With the exception of jobs in the scavenger paritition, the CPU time for running your job (multiplied by the number of processor cores consumed) will be charged to that allocation. (Because scavenger partition jobs are ultra-low priority and can be preempted, we do not charge for CPU time on that partition.) The charges are in terms of Service Units, typically abbreviated as SU, with 1 SU = 1 hour of walltime on a CPU-core. So for a 3 hour job running on all the cores of a node with 2 processors and 8 cores/processor, the cost would be 3 hours * 2 processors * 8 cores/processors or 48 SU.

WARNING
You are charged for cores consumed, not used. I.e., if you request 1 core on a node, but also request no other jobs be run on the node, you will be charged for ALL cores on the node assigned, since no one else can use them while your job is running. See for more information.

Although the account does not get debited until the job finishes, the scheduler keeps track of all jobs running against a given account, and keeps track of how many SUs are required to complete these jobs (using the walltime requirements requested when the job was submitted). Before a new job charging against that account is started, the scheduler makes sure that there are sufficient funds to complete it AND all currently running jobs charging against that account. If there are, the job can be started; otherwise, it is left in the pending state with a reason code AssociationJobLimit.

Groups can get allocations in one of several ways:

  1. If the group contributed equipment into the cluster, they get a normal and high-priority accounts as part of a paid "co-op" allocation.
  2. If the group did not contribute equipment into the cluster, the group can submit a proposal, and if the proposal is approved by the HPC Allocations and Advisory Committee (AAC), they group will be granted an one-time AAC grant allocation.
  3. At some point, it will be possible to purchase SUs from the AAC. This will behave much like the one-time AAC grant allocations; the only real difference is that you provide an FRS number to charge against instead of submitting a proposal.

Paid "co-op" Allocations

Groups which contribute funds for hardware for the cluster will typically get a paid "co-op" allocation. Unlike the "condo" models that many other institutions grant, in the "co-op" model you do not get certain nodes assigned to you. Instead, you get a share in the entire cluster. The amount of computing power (in SUs or CPU-core-hours per quarter) of the contributed hardware is calculated, and (after a 20% overhead, which includes all scheduled and unscheduled downtime of the nodes), this base allotment is provided as funds in your accounts.

Co-op allocations get two accounts, for a project named PROJECT there will be a normal-priority allocation PROJECT and a high-priority allocation PROJECT-hi. As the name implies, jobs submitted with the high-priority allocation will be preferentially scheduled over jobs using the normal-priority allocation. (With two exceptions: Jobs in the scavenger partition are not charged for, and always run at ultra-low priority. Jobs in the debug partition always run at a high-priority, but these are severely time constrained.) Both co-op accounts also get the ability to run jobs for extended lengths of time; other accounts cannot run jobs for more than 3 days, but jobs charged against co-op accounts can avail themselves of the *-extended QOSes and run for up to 7-14 days depending on the number of nodes required.

The two accounts are replenished quarterly/monthly. The normal-priority account is replenished quarterly, to the base allotment described above. The high-priority account is then filled from the normal priority account every month, to one-third of this base allotment. So every month, you get the monthly allotment (1/3 of the base quarterly allotment) of your contribution available for use with high-priority. But you can also use your normal-priority allotment, effectively borrowing time from future or past months in the quarter, if you have a temporary surge in your need for computing power.

This is generally to your benefit over the traditional "condo" model; in that model you get access to the nodes your funds have purchased, but only to those nodes and nothing else. So if you bought 3 nodes, you can never run a job requiring more than 3 nodes (except by buying additional nodes). But in the co-op model, you could run a 9 node job if desired, albeit for only a third of the time that you could run 3 node jobs. And even if you never need more than your 3 nodes for a single job, you could also run three 3-node jobs simultaneously in the co-op model. Again, not for the entire month/quarter, but for a while. This might come in handy when your demand for CPU power is not steady over the course of the quarter; e.g. if there is a conference in the middle of the quarter, you might want to use up most of your entire quarter's allotment in preparation for it, even if that leaves you underfunded for the rest of the quarter.

AAC-Granted Allocations

The HPC Allocations and Advisory Committee (AAC) can grant one-time unpaid allocations to faculty and students for small projects, classes, feasibility tests, etc. These allocations are granted out of computing resources purchased by funds from the Division of Information Technology. Jobs submitted with these allocations run at normal priority (unless submitted to the scavenger partition, which runs all jobs at ultra-low priority, or short jobs submitted to the debug partition).

Choosing the Account to Use

If you only have a single account (check with the sbalance command), you can skip this section. You only have the one account, so there is nothing to choose.

If you have multiple accounts due to your membership in multiple groups, you may wish to choose which account you use based on your job. I.e., if the job is doing something for group A, you probably should only submit it using one of the group A accounts, even if you also have access to group B accounts. If the research areas of the two groups overlap, you will need to follow what ever group-specific policies may exist (contact your colleagues).

If you have access to normal and high-priority accounts, you probably want to submit the job to the high-priority account. These are replenished monthly, and funds do not carry over, so you might as well use it.

Of course, you need to ensure that the account you choose has sufficient funds for your job. If when your job is about to start running there are not sufficient funds to cover its expected cost (based on the amount of walltime you requested for the job) your job will not run and instead be left in a pending state (with reason AssociationJobLimit) until such time as funds are available. Because the account does not get debited until a job finishes, this calculation of sufficient funds takes into consideration the balance in the account as well as the expected funds required for completion not only of the job you are submitting, but ALL currently running jobs charging against the same account. Again, this is based on the walltimes requested with the jobs, and is another reason why accurately specifying walltime is important.

WARNING
Note that the queuing system will NOT automatically select another account, if for example your high-priority account is depleted but funds exist in your normal-priority account. The job will just get deferred.

Note also that others in your group may have access to the same account, so just because funds were there when you submitted a job, someone else's jobs may have started since then and may reduce the funds in the account.

See here for more information about specifying the account to be charged when submitting a job.

The Replenishing Process

Only co-op accounts get automatically replenished. For other types of accounts, jobs will deplete funds in the account until the account runs out of funds, or the time limit for the project, etc. for which the account was granted by the HPC Allocations and Advisory Committee expires and the account is deleted.

Co-op accounts get refreshed every month. For each group which contributed funds towards equipment in the cluster, a raw quarterly value equal to the amount of computation that can be done on that equipment in a month is computed (currently just number of cores times number of hours in a quarter; no adjustment for CPU speed is currently made). From this, 20% is removed for Division of Information Technology overhead --- this covers administrative and other downtime, and some may be used for unpaid allocations. This is the group's quarterly base allotment.

Every quarter, on the first day of the month (e.g. 1 Jan, 1 Apr, 1 Jul, 1 Oct), the balance for the normal-priority account for each group is reset to the quarterly base allotment. Any amount left over from the previous quarter is lost.

On the first of every month (after the quarterly allotment is calculated if it is also the start of the quarter), the high-priority account for the group is replenished by transferring funds from the normal-priority account. The high-priority allotment will be brought up to one third of the quarterly base allotment (e.g. the monthly base allotment) provided that there are sufficient funds in normal account. If there are not sufficient funds in the normal account, whatever amount is left in the normal account is moved to the high-priority account.

If your group completely uses up exactly their hi-priority account every month, and does not directly use their normal-priority account, at the beginning of each month in a quarter one should see:

  • Month 1: normal-priority=2X, high-priority=X (start of quarter)
  • Month 2: normal-priority=X, high-priority=X
  • Month 3: normal-priority=0, high-priority=X

where X is your monthly base allotment. I.e., at the start of the quarter, the normal-priority allocation is reset to the quarterly base allotment, and the monthly base allotment is transferred immediately to the high-priority account. At the start of second and third months, a monthly base allotment is again transferred to the high-priority, leaving the normal-priority account depleted at the start of month 3.

In practice, you will see some variation, due to the high-priority account not being completely depleted at the end of the month (so less than a full monthly base allotment is transferred out of the normal-priority account, resulting in it having more funds), and jobs running against the normal priority account, reducing its funds. Note: there is no rollover of unused funds from quarter to quarter in the normal-priority account, or month to month in the high-priority account. (Although unused funds in high-priority allocation will mean less funds will be transferred out of normal-priority account to refresh it, resulting in extra normal-priority funds).

WARNING
It is recommended that you generally use up your high-priority funds first, instead of using normal-priority funds. If you do not use them, they go away (or effectively get converted to normal priority) at the end of the month)

Monitoring Allocations

You and your research group are responsible for ensuring proper rationing of the funds in your account(s). Excessive use of funds for a "co-op" type of project in the first month of a quarter could result in no funds at all for the next two months in either the high-priority or standard priority allocation.

This can be deliberate and beneficial, e.g. if you have important deadlines at the end of the first month a the quarter and are willing to "borrow ahead" to get computations for that completed before the deadlines. This is an advantage of the model used by the Deepthought HPC clusters; you can use nearly 3 times the power of the computers you purchased in a single to rush out computations, at the cost of having very limitted usage the following two months (but since it is after the deadlines, that might not be important).

But if this occurs because some junior member of the group is sending an excessive number of very expensive jobs, this can be quite problematic, especially as you might not notice the impact of the errant user until too late.

The Division of Information Technology cannot tell which jobs are important and which are not, nor what is good usage of your allocation funds and what is not. If we notice seriously problematic usage (e.g. a job reserving 10 nodes but only running processes on 1 node), we will do our best to notify and instruct the relevant users. But you are responsible for monitoring your own jobs, and it behooves you to monitor jobs of other users of your allocations. We will provide the necessary tools to do such, but we strongly advise all research groups to have at least one person monitor the usage of their allocations' funds regularly to ensure there are no problems, or at least catch any problems early.

How many SUs are left in my allocation?

The first level of monitoring of your allocations is with the sbalance command. E.g.

payerle:login-1:~>sbalance
Account: test-hi (dt)
Limit:     163.52 kSU
Available: 163.47 kSU 
Used:      0.05 kSU (0.0 % of limit)

Account: test (dt)
Limit:     327.04 kSU
Available: 325.33 kSU 
Used:      1.71 kSU (0.5 % of limit)

Without any arguments, it will list usage metrics for all accounts to which you have access to. The above listing is from early in the quarter for a co-op type project; note that both accounts are nearly full, and that the test account has nearly double the amount of the test-hi account. The line starting with "Used" not only gives the number of kSU used, but also the usage as the percentage of the limit. If this percentage is significantly higher than the percentage of time between now and the start of the month (for your high-priority account), or the start of the quarter (for normal-priority accounts), you might need to get concerned. I.e., if at one week into the month you see the usage on your high-priority account is over 30% of the limit, your group is burning your SUs faster than they will be renewed, and you might have some time at the end of the month with nothing in your high-priority account.

For AAC grant type accounts, there is no monthly or quarterly replenishment. The "Limit" should reflect the amount of compute time the AAC granted you, and the percentage is how much of that you have used. If the percentage used is significantly greater than the percent of your work which is complete, you should consider working on an update to your proposal to request more time.

If you are tasked with monitoring the usage of the accounts by your colleagues in the project (or have taken said task upon yourself), you can use the -all flag to sbalance to see who is using the funds in the account. You might also wish to use the -account flag to limit the output to a single account, e.g.:

login-1: sbalance -account test-hi -all
Account: test-hi (dt)
Limit:     163.52 kSU
Available: 102.07 kSU 
Used:      61.45 kSU (37.6 % of limit)
        User jtl used 17.6044 kSU (28.6 % of total usage)
        User kevin used 13.3456 kSU (21.7 % of total usage)
        User payerle used 30.5000 kSU (49.6 % of total usage)

This lists the same information as before, with the addition of showing every user who has used the account in the time period, showing not only the number of kSU they consumed, but what percentage of the total usage for the account. E.g., in the example above, you can see that user payerle is using almost as much as users kevin and jtl combined. You can add the flag --nosuppress0 if you want to also see lines for everyone with access to the allocation but who did not consume any time since the last refresh.

The --help option to sbalance will display usage options, most of which were discussed above.

The time period for the usage statistics depends on the type of account and project. For co-op (replenishing) projects, it is from the start of the month. For AAC grant accounts: from the start of the project/grant.

General information about an allocation

General information about allocations you belong to can be obtained with the my_projects command. This command can only be run from the login nodes (i.e. it will not work on the compute nodes), and provides basic information regarding allocations you belong to.

Usage is basically my_projects to display information for all allocations that you are a member of, or my_projects ALLOCATION_NAME to display information for a specific allocation (you can give multiple ALLOCATION_NAMEs to list information for multiple allocations). You may also wish to include one or two --verbose (or -v for short) flags to include more information. You can also give a --help for a full description of all the flags the command accepts.

Without any verbose flags, it will display the name of the allocation project, the name of the parent project (if any), and the department and college associated with the project.

With one verbose flag, it will also display the "points-of-contact" for the project, and the members of the project. the points-of-contact are the people who are authorized to add/remove members from the allocation. It will also display the base kSU level, and indicate whether the project autoreplenishes each quarter or not.

The information with two verbose flags is probably not very useful; basically a description of the project (which is usually not informative) and the over/underusage alert thresholds which determine if/when the points-of-contact are emailed regarding excessive/etc usage of their allocation (if no value is listed, a global default is used). The over/underusage thresholds are explained a bit more in the section on checking for excessive usage.

NOTE: the allocation project names are for the project. Some projects have both a standard and high-priority allocation account; however, they are still one project, and only one listing will be shown in the my_projects command. The base kSU level is the total of the standard and high-priority kSU at the start of the quarter.

Seeing job history

The sacct command can be used to view the accounting records of jobs, both past and currently running. It takes some time to run, and can display a fair amount of information (which is documented in its man page). You will almost always wish to restrict it to a time range, so to see the usage of account foo for the month of November 2014, one could use


login-1> sacct --format=JobID,User,Account,ReqCPUs,AllocCPUS,Elapsed,CPUTime \
	-a  -X  -S  2014-11-01 -E 2014-11-30 -A foo

       JobID      User    Account  ReqCPUS  AllocCPUS    Elapsed    CPUTime 
------------ --------- ---------- -------- ---------- ---------- ---------- 

2717747       payerle  foo             16         20 1-00:00:09 20-00:03:00 
2717748       payerle  foo             16         20 1-00:00:09 20-00:03:00 
2717749       payerle  foo             16         20 1-00:00:09 20-00:03:00 
2717750       payerle  foo             16         20 1-00:00:08 20-00:02:40 
2717751       payerle  foo             16         20 1-00:00:08 20-00:02:40 
2717752       payerle  foo             16         20 1-00:00:08 20-00:02:40 
2717753       payerle  foo             16         20 1-00:00:17 20-00:05:40 
2717754       payerle  foo             16         20 1-00:00:17 20-00:05:40 
2717755       payerle  foo             16         20 1-00:00:17 20-00:05:40 
2717756       payerle  foo             16         20 1-00:00:12 20-00:04:00 
2718384       payerle  foo             10          0   00:00:00   00:00:00 
2718385       payerle  foo             10          0   00:00:00   00:00:00 
2718386       payerle  foo             10          0   00:00:00   00:00:00 

Here,

  • ReqCPUs is the number of cores requested
  • AllocCPUs is the number of cores allocated to the job. The jobs shown were run in exclusive mode, so the full 20 cores on the node were allocated to it.
  • Elapsed is the elapsed walltime for the job
  • CPUTime is the elapsed walltime times AllocCPUs. This is what is charged against the foo account.
  • The last three jobs are still pending, so have not been allocated any CPUs yet, and have not accumulated any walltime (or charges).

Monitoring for excessive usage

An important aspect of managing the usage of an allocation is ensuring that SUs are being consumed at a reasonable rate. The system intentionally allows flexibility in the rate in which SUs are consumed; e.g. if you have a major conference in the middle of a quarter, you might wish to (and can) use up most or all of your allocated funds for a quarterly replenishing allocation in the first month of the quarter, leaving (almost) nothing left for the remaining two months of the quarter. If that is your intent and desire (and the rest of the users of this allocation agree with you), all is well. However, if a few profligate users consume most of the quarterly allocation in the first month without the consent of the rest of the users of the allocation, there is a major problem.

From the system's point of view, the two examples above will look the same --- the SUs were consumed at an excessive rate in the first month of the quarter. We cannot tell if that was done for a good reason or by mistake by inexperienced users --- that is a judgement call which the points-of-contact (PoCs) of the allocation will need to make. What we can do is try to alert the PoCs when something like that appears to be happening, and hopefully early enough that if it is happening improperly that behaviors can be adjusted before this leads to serious problems.

NOTE: the following only applies to quarterly auto-replenishing allocations. Non-replenishing allocations (e.g. allocations granted by the AAC on the Deepthought HPC clusters and Engineering Startup Allocations (i.e. allocations whose names start with "esu-")) are not currently supported by the tools described below. Since they do not auto-replenish, you can use the sbalance command described previously to see how much of the total allocation has been consumed, and compare that to your estimates of the amount of work needed to complete the project.

The command check_project_usage compares the fraction of the allocation's quarterly allotment that has been consumed in the current quarter to the fraction of the quarter that has gone by. If the fraction of SUs used exceeds the point in the quarter by more than a certain threshold, it will flag that allocation as having unsustainable usage. (It also similarly checks for significant underusage, but the default threshold for that is such as to never flag underusage). The global default overusage threshold is 15 percentage points; PoCs can request different default thresholds for a specific allocation (just send email to hpcc-help@umd.edu requesting such; this will change the defaults used in the automated mail as well), and anyone can specify thresholds on the command line as well. E.g., if we are one third of the way into the quarter (i.e. one month into the quarter) and 50% of the allocation has been used, and alert will be raised using the global default threshold (as 33% + 15% = 48% < 50%) . If a threshold of 20% was to be used, no alert would be raised (as 33% + 20% = 53% > 50%).

By default, the check_project_usage command will check all allocations for which you are a member for excessive usage. If one or more allocations appear to be being consumed at an unsustainable rate, it will print usage information and warnings for that allocation. If no excessive usage is detected, normally it will not print anything. (NOTE: if you are a member of non-replenishing allocations as well, you will get a brief warning stating that the code is skipping the non-replenishing allocation.)

You can provide the --help or -h flags to get full usage information. You can specify allocation project names on the command line to only check the named allocations (NOTE: these are allocation project names, so should not include the -hi suffix; because the standard and high-priority balances are linked, it checks both simultaneously.) You can also give the --verbose or -v flag, which will cause usage information to be displayed even if no over/underusage condition was flagged.

login-1> check_project_usage
Project: testproj1
Time: 2016 Oct 14
Overquota Threshold: 15.0%
Underquota Threshold: 100.0%
------          TimePeriod (percent into) Allocation   Available    PctUsed
HiPriority      month      (  43.7% into) 67.500 kSU   15.858 kSU   76.5%  
Total           quarter    (  14.7% into) 202.500 kSU  125.970 kSU  37.8%  

*** Excessive rate of consumption for HiPriority!
*** Excessive rate of consumption for Total!
login-1> 
login-1> check_project_usage -v 
Project: testproj1
Time: 2016 Oct 14
Overquota Threshold: 15.0%
Underquota Threshold: 100.0%
------          TimePeriod (percent into) Allocation   Available    PctUsed
HiPriority      month      (  43.7% into) 67.500 kSU   15.858 kSU   76.5%  
Total           quarter    (  14.7% into) 202.500 kSU  125.970 kSU  37.8%  

*** Excessive rate of consumption for HiPriority!
*** Excessive rate of consumption for Total!
========================================
Project: testproj2
Time: 2016 Oct 14
Overquota Threshold: 15.0%
Underquota Threshold: 100.0%
------          TimePeriod (percent into) Allocation   Available    PctUsed
HiPriority      month      (  43.7% into) 60.181 kSU   53.543 kSU   11.0%  
Total           quarter    (  14.7% into) 180.544 kSU  162.452 kSU  10.0%  
login-1> 
login-1> check_project_usage testproj2
login-1> 

The first time we execute check_project_usage above, it displays the usage for testproj1 with warnings of excessive usage for both the high-priority allocation account (as 76% > 43% + 15%) and the total allocation (as 37% > 14% + 15% ). The second run has the verbose flag, and so in addition to showing the excessive usage for testproj1, it also displays the usage for testproj2 even though it is not problematic (PctUsed is less than the "percent into" the month/quarter, respectively). The final invocation does not have the verbose flag, but specifies to only check testproj2; this produces no output as there is no excessive usage condition.

If you wish to include this command in your dot files to alert you to overusage issues whenever you log in, be sure to run it only for interactive sessions --- not only will it needlessly slow down non-interactive shells, but if it produces output it can mess up file transfers with scp, etc. E.g., for csh or tcsh users, something like:

if ( $?prompt ) then
	check_project_usage
	... other interactive only commands if desired ...
endif

If your default shell is sh or bash, something like:

if [ ! "x$PS1" = "x" ]; then
	check_project_usage
	... other interactive only commands if desired ...
fi

The Division of Information Technology actually runs a similar script every few hours on every auto-replenishing allocation, and will send email to the points-of-contact for the allocation if it is flagged as being consumed at an unsustainable rate. To avoid "spamming" the points-of-contact, we will not send out email to a given user more than once every three days. In this automated case, the project specific overusage threshold is used (or the global default is not project specific threshold was set). A point-of-contact can request a change to the threshold for any of their allocations be sending an email request to hpcc-help@umd.edu. They can similarly request a change in the minimum number of days between emails sent to them. NOTE: the thresholds are per-project/allocation, and affect alerts to all points-of-contact for that allocation. The minimum number of days between emails are per person, and affect alerts for all allocation projects that person is a point-of-contact for. Also note that limiting of the frequency of emails is applied separately to each project you are a point-of-contact for, so if you receive an alert about allocation A today, you may still receive an alert about allocation B tomorrow, but should not receive another alert about allocation A for several days.

Usage reports

For non-replenishing allocations, the sbalance command returns information pertaining to the usage of the allocation over the allocations lifetime. For replenishing allocations, however, most of the tools mentioned above only return data about usage for the current quarter. While this is probably what most users are concerned with most of the time (e.g., if I want to figure out if there are enough kSUs to run my job now, usage from previous quarters is irrelevant), but sometimes one needs information regarding usage over longer time scales. This is especially useful for people who manage "super-allocations".

There are a couple of tools available to get more historic information regarding allocation use:

The Deepthought XDMoD website is a web page running the Open XDMoD (Open XD Metrics on Demand) web application. This can present in graphical form many metrics pertaining to the Deepthought clusters. One can see how many kSUs were consumed by a given allocation as a function of time, or what the average job length for an allocation over the past year. Some of the more advanced filtering and reporting features requires one to register for a "login account" on the XDMoD website (unfortunately, there is no easy way to tie this into our existing authentication system); you can do so from the website.

The slurm-usage-report command runs from the login nodes of either Deepthought cluster. This command examines all the job records related to the allocation account(s) specified, and provides summaries. (As opposed to the sacct command which lists details for each job, but does not summarize.). Because it has to go through all the job records, it does tend to be a bit slow.

We only discuss some of the more commonly used options below; the command supports a --help or -h option which provides more information on its usage (including some options to provide even more usage information). The commonly used options are:

  • --account=ACCOUNT: this specifies which allocation accounts should be looked at. You can specify multiple allocation accounts by either repeating this argument and/or by replacing ACCOUNT with a comma-delimited list of allocation account names. If no allocation accounts are given, it defaults to all allocation accounts for which you are either a member of a point-of-contact.
  • --unit=UNIT: this specifies which unit should be used in output. The default is 'SU', but 'kSU', 'cpu-min' or even 'cpu-sec' are alternatives.
  • --start=START: this specifies the start of the time-period which is being examined. By default, it defaults to the start of the current quarter, but you can specify another start time by giving a date in the YYYY-MM-DD format (or a date and time in the YYYY-MM-DDThh:mm:ss format).
  • --end=END: this specifies the end of the time-period being examined. There is no default (although see also the --timeperiod flag). If given, it uses the same format as the start time.
  • --timeperiod=TIMEPERIOD: this is an alternative way of specifying the end of the time-period being examined. It should not be used if the --end flag was used. It defaults to 'quarter', but 'day', 'week', 'month', and 'year' are also valid options.
  • --machine-parsable: If this flag is given, the output produces is in a delimiter-separated-values format, using a pipe ('|') character as the delimiter. This is useful if you do further processing on the output, e.g. in a spreadsheet.
  • --noheaders: If this flag is given, the normal header text is not printed. This might be useful when using --machine-parsable.
  • --byuser: Normally the script summarizes usage at the allocation account level, but if this flag is given the information is presented by user and allocation account.

The slurm_jobstats_for_alloc also prints information about usage of allocations on the Deepthought cluster, but is generally more geared toward assisting managers of superallocations determine which suballocations have and have not been using the cluster. By default, it will print out for each allocation account the following information (for the specified timeperiod):

  • the total number of jobs charged against that allocation account run during the specified time period
  • the total number of CPU-cores allocated for jobs for that allocation account
  • the total number of SUs charged against that allocation account during the specified period.
  • the date and job number of the first job run against that allocation account in the time period.
  • the date and job number of the last job run against that allocation account in the time period.

We only discuss some of the more commonly used options below; the command supports a --help or -h option which provides more information on its usage (including some options to provide even more usage information). The commonly used options are:

  • --account ACCOUNT: Specifies the allocation account(s) to be examined. You can give multiple allocation accounts by repeating this argument, and/or providing a comma delimited list of accounts in place of ACCOUNT (Note: you will need to quote if the list contains spaces).
  • --file FILENAME: Sometimes it is easier to give a file containing a list of allocation account names, one per line. This allows you to specify such a file.
  • --treat-as-projects: Normally, the values given to the --account argument or in the FILENAME argument are interpretted as allocation accounts. With this flag, they will be treated as project names; e.g. given ACCOUNT as 'foo' will result in getting results for 'foo' and 'foo-hi'. (NOTE: if used on an account for which there is no high priority allocation account, in conjunction with --nosuprress0 below, one might get a bogus listing for the non-existant high priority allocation)
  • --start STARTDATE: Specifies the start of the timeperiod to collect statistics for. Should be given as YYYY-MM-DD or, if a specific time of day desired YYYY-MM-DDTHH:MM:SS.
  • --end ENDDATE: Specifies the end of the timeperiod. Same format as STARTDATE.
  • --nosuppress0: Normally, allocation accounts for which no jobs were found are elided from output. Use this if you wish to see them.
  • --machine-parsable: This will generate output in a delimiter-separated-values format, using a pipe ('|') character as the delimiter. Useful if you want to bring the data into a spreadsheet for further analysis.
  • --combine-project-allocations: Normally, output is displayed for each allocation account separately. If this flag is given, the output for 'foo' and 'foo-hi' allocation accounts are combined.