HPC Acknowledgement
If you use any HPC resource in your research efforts, we appreciate you adding the following statement in your Acknowledgement section: “We would like to thank The University of Alabama and the Office of Information Technology for providing high-performance computing resources and support that have contributed to these research results.”
Useful commands for detailed Slurm job advanced information:
The links below provide some useful information about basic slurm commands and get started with using slurm to submit a job and how to specify the partition and qos:
All slurm command options can be checked by, e.g. man sinfo etc
sinfo --format "%20N %.6D %20P %.11T %.4c %.8z %.8m %.8d %.6w %.8f %20E"
will give each node’s number of CPUs, and memory per node in MB
also, you can play with sinfo
, sinfo -s
, sinfo -N -l
,
Note: -N, -l, -s will all be overwritten by –format option
scontrol show job job_id
gives the job information
scontrol show node compute-1-1
gives the node information of compute-1-1 node
sacct -j job_id --format="MaxRSS"
will give you the memory usage of a job
for some particular reason, if in the partition you specified, you want to use an individual node to run the job, you can do it by in sbatch file by e.g. #SBATCH --nodelist compute-1-1
slurmtop
gives a good overview of current status of each node and how current running jobs are allocated on each CPUs of all nodes
check the running jobs on a specific node, squeue -w compute-1-10
How do I submit jobs to a specific queue?
Slurm doesn’t really have queues; the closest analog would be the combination of partition and QOS. A partition is a group of nodes, and a QOS is a policy that says how many cores you can use for how long, whether jobs can be preempted, etc.
Submit with sbatch -p partition -q qosname
What partitions and QOSes are available?
The following partitions are limited to stakeholders who own the compute nodes in each: owners, highmem, ultrahigh, mh1, gpu_owners.
The following partitions can be used by anyone: main, long, threaded, gpu.
The long and threaded partitions include separate sets of nodes all owned by OIT, so anyone can use them without being preempted by a stakeholder who owns the nodes. The main partition includes all of the nodes from the long, owners, highmem, ultrahigh, and mh1 partitions, but jobs in the main partition may be subject to preemption if they end up on a node owned by a stakeholder who then submits a job in the owners, highmem, ultrahigh, or mh1 partition. The main partition allows jobs of any size, but the higher-priority jobs submitted in owners, highmem, or ultrahigh are limited to the amount of compute resources the stakeholder has purchased. The idea is that the people who own compute nodes are guaranteed access to the resources they own, within a reasonable timeframe, but others can make opportunistic use of the same hardware when it would be sitting idle.
Then there are the different options for qos. Each stakeholder has their own qos named after their username or their department, for example, avolkov1, math, etc., which allows them to use the resources they purchased for an unlimited time, preempting others. There are a few specialized qos policies that can only be used on a specific partition, for example on the threaded partition you must use the threaded qos, and on the gpu partition you must use the gpu qos. The main qos is the one you’d usually use in the main partition, allowing up to 24 hour runtime on any number of nodes / cores, but jobs are preemptable as described above.
The long qos allows up to 7 day runtime, and can be used on either the long partition or the main partition. Overall, long qos jobs are limited to about 1/4 of the cluster at any given time so that resources will be available for shorter jobs within a reasonable timeframe. If used on the main partition, it may mean that the job is preemptable, but it may be easier for SLURM to find somewhere to run the job since more nodes are included in the main partition. If used on the long partition it will mean that the job cannot be preempted but the types of nodes are more limited.
In summary:
-p main
: This is the partition which we expect to be used most of the time. QOSes available to use with this are “main”, “long”, and “debug”.
-p long
: This partition contains nodes that cannot be preempted by an owner. Use only the “long” QOS with this.
-p owners
: This partition is for system stakeholders who own nodes. It gives a higher priority and may preempt jobs running in the main partition. Users are assigned an owner QOS based on their group.
-p highmem
: This partition is only available to stakeholders who have purchased nodes with moderately increased amounts of memory per core. It gives a higher priority and may preempt jobs running in the main partition. Users are assigned an owner QOS for this if they are in these particular stakeholders groups.
-p ultrahigh
: This partition is like highmem, but even moreso. It is typically reserved for those who have purchased nodes with 1TB of memory.
-p threaded
: This partition contains a small number of OIT-owned nodes intended for software that uses lightweight processes / threads to parallelize, rather than a distributed framework like MPI. The nodes have slightly faster CPUs than the others, and much more memory. Use only the “threaded” QOS with this.
QOSes give extra properties or apply limitations to Partitions.
-q main
: This is the QOS which we expect to be used most of the time. It has a limit of 24 hours per job so that users with pending jobs can expect theirs to begin in a reasonable time. It has no resource limits other than length of run. Users should submit and requeue in this one. The use of check-pointing is highly advised. Note that this is not the default queue, you will have to explicitly request it.
-q long
: This queue will run jobs for up to 1 wk. It is limited to a total of 309 CPUs and is intended for jobs that cannot easily be set up to checkpoint for the main qos.
-q debug
: This is a 15 minute debug queue. You can test a new job here without worrying that it might hang for too long and cause problems for other people. It has a little extra priority so it can sneak in and get your test done.
Additionally, owners each have their own QOS which is named after their username or department. The owner QOS lets an owner allocate the number of nodes up to their stake-hold but does not limit the amount of time for which a job will run. An owner’s QOS is only valid for the “owners”, “highmem”, or other stakeholder partition. You would not select the “main” or “long” partition for one of these. For example,-p owners -q math
: gives math department users access for up to 16 cores in the owners partition for an unlimited time.
How do I submit GPU jobs?
OIT purchased one compute node with a GPU, a Tesla V100 with 16GB. Since then, a stakeholder purchased a second GPU to install in the same node. Each GPU is associated with one 12-core CPU in the node, so we recommend requesting half the resources of the node for a typical GPU job:
#!/bin/bash #SBATCH -p gpu #SBATCH -q gpu #SBATCH -n 1 #SBATCH -c 12 #SBATCH --mem=45G #SBATCH --gres=gpu:v100:1 ... run software that uses the gpu
The second GPU, a Tesla V100 with 32GB, would be available with --gres=gpu:v100-32:1
instead. It is also possible to request both GPUs with --gres=gpu:2
(in which case the job can also use 24 cores and about 90GB of memory), or request any available GPU with --gres=gpu:1
. Under some circumstances, a job in the gpu partition may be preempted by a job in the gpu_owners partition if the owner needs immediate access to their GPU, but we are still working out the details because they don’t own the entire GPU node.