The Phoenix cluster nodes are grouped into different sets (partitions) each fulfilling different purposes. These partitions can be considered to be separate "job queues", each of which can have a variety of sets of constraints on certain resources (nodes, processors, memory, time, etc.), so-called quality of service (QoS) constraints, associated with it. The Phoenix cluster runs the SLURM scheduler to manage all partitions.
The following will help you select the right partition for your job.
Contents
[hide]
- 1How to submit to a partition (queue)
- 2Partitions
How to submit to a partition (queue)[edit | edit source]
There are two ways to tell SLURM which partition your job should run on. The first is through a direct command when submitting a job, i.e.:
sbatch -p <partition> jobscript
The second, more common, way is to define it in your jobscript via:
#SBATCH -p <partition>
Partitions[edit | edit source]
The Phoenix cluster currently supports a number of different major SLURM partitions:
- batch
- cpu
- skylake
- skypool
- gpu
- volta
- test
- skytest
- highmem
The batch
partition[edit | edit source]
This is the default partition, and in the majority of cases you should submit your job to the batch
partition. When the batch
partition is specified, the scheduler will use an internal algorithm will determine which partition your job is eligible for, and it will be re-routed to all eligible partitions. The following partitions are considered for job eligibility:
- cpu, skylake, skypool for CPU-only jobs
- gpu, volta for GPU jobs
The majority of jobs will be eligible for multiple partitions, and will run in the first partition with available resources that meet the job requirements.
The cpu
, skylake
and skypool
partitions[edit | edit source]
These partitions are meant for general computational jobs that do not require any GPU accelerators and a moderate amount of RAM. Most of the jobs submitted to Phoenix will run in these partitions.
cpu partition limits[edit | edit source]
The cpu
partition targets Phoenix nodes that have the Haswell Intel CPU architecture. It is also the partition with most resources (i.e. nodes) associated with it.
cpu job constraints | |
Max Job time | 3-00:00:00 |
Max cpus/node | 32 |
Max RAM per cpu | 4 GB |
Max RAM per node | 125 GB |
Cost | 1 SU per CPU hour |
Can use GPUs | No |
skylake partition limits[edit | edit source]
The skylake
partition targets Phoenix nodes that have the Skylake Intel CPU architecture. This partition is suitable for general computational jobs that do not require any GPU accelerators, and a moderate to large amount of RAM. The Skylake nodes have up to 384 GB of memory available, which is triple the amount available to the Haswell nodes.
skylake job constraints | |
Max Job time | 3-00:00:00 |
Max cpus/node | 40 |
Max RAM per cpu | 9665 MB |
Max RAM per node | 377 GB |
Cost | 1.25 SU per CPU hour |
Can use GPUs | No |
skypool partition limits[edit | edit source]
Jobs that are eligible to run on the skylake
partition will also be considered for the skypool
partition. The purpose of the skypool
partition is to maximise CPU utilisation across the Skylake nodes by allowing CPU-only jobs that have a sufficiently small core-per-node requirement to run alongside GPU jobs in the volta
partition.
skypool job constraints | |
Max Job time | 3-00:00:00 |
Max cpus/node | 32 |
Max RAM per cpu | 9665 MB |
Max RAM per node | 301 GB |
Cost | 1.25 SU per CPU hour |
Can use GPUs | No |
Long QoS[edit | edit source]
If for some reason it is not possible to run your job within three days by e.g. increasing the number of CPUs/RAM or by check-pointing (saving the current state and restarting it), we offer a long QoS upon special request. Each request will be assessed on an individual basis. Accessing the long QoS requires a fair-share factor of greater than 0.25.
If you have been granted access to the long QoS, you can select it by adding the following to your job submission script:
#SBATCH --qos=long
The long QoS features are the following:
long QoS constraints | |
Max Job time | 7-00:00:00 |
The gpu
and volta
partitions[edit | edit source]
These partitions are suitable for computational jobs which require GPU accelerators. Programmes that can run efficiently on GPUs can see speed-ups of 10X or better. (1 CPU hour vs. 1 GPU hour).
gpu partition limits[edit | edit source]
The gpu
partition targets the Phoenix nodes that have Nvidia K80 accelerators. For the Nvidia K80s, the cost for one GPU-hour is 8 service units (SU). The minimum number of cores per GPU is 2, and the maximum is 4. The K80 nodes have 2 x K80 cards, and as each K80 is a dual-GPU device, there is a maximum of 4 GPUs that can be used per node.
gpu job constraints | |
Max Job time | 2-00:00:00 |
Max cpus/node | 16 |
Max RAM per cpu | 4 GB |
Max RAM per node | 64 GB |
Cost | 8 SU per GPU hour |
Can use GPUs | Yes |
The resources available for jobs that are assigned to the gpu
partition depend on the number of GPUs requested. The scheduler controls these limits by automatically assigning each job one of three QoS flags, named gxs, gxm, gxl
. These flags are designed to optimise the overall system utilisation, with each providing different core and memory limits based on the number of GPUs required. The following table summarises the constrains.
QoS | gxs | gxm | gxl |
GPUs/node | 1 | 2 | 3 or 4 |
Max cpus/node requestable | 4 | 8 | 16 |
Max RAM/node requestable | 16 GB | 32 GB | 64 GB |
Max RAM per cpu | 4 GB | 4 GB | 4 GB |
Max Job time | 2-00:00:00 | ||
Cost | 8 SU per GPU hour |
Jobs submitted to the gpu
partition will also be considered for routing to the volta
partition, if the resource requirements allow for it.
If you wish to specifically request K80 GPUs for your job, you can do so using the kepler gres subtype. For example, to request 2 kepler GPUs, place the following line in your job script:
#SBATCH --gres=gpu:kepler:2
volta partition limits[edit | edit source]
The volta
partition targets the Phoenix nodes that have Nvidia V100 accelerators. This partition is suitable for all computational jobs which require up to 2 GPU accelerators per node. In particular, the Nvidia V100 cards have a number of tensor cores that are optimised for deep learning workloads.
For the Nvidia V100s, the cost for one GPU-hour is 32 service units (SU). The minimum number of cores per GPU is 2, and the maximum is 8. There is a maximum of 2 V100 GPUs that can be used per node.
volta job constraints | |
Max Job time | 2-00:00:00 |
Max cpus/node | 16 |
Max RAM per cpu | 9665 MB |
Max RAM per node | 151 GB |
Cost | 32 SU per GPU hour |
Can use GPUs | Yes |
The resources available for jobs that are assigned to the volta
partition depend on the number of GPUs requested. The scheduler controls these limits by automatically assigning each job one of two QoS flags, named vxs, vxm
. These flags are designed to optimise the overall system utilisation, with each providing different core and memory limits based on the number of GPUs required. The following table summarises the constrains.
QoS | vxs | vxm |
GPUs/node | 1 | 2 |
Max cpus/node requestable | 8 | 16 |
Max RAM/node requestable | 76 GB | 152 GB |
Max RAM per cpu | 9.5 GB | 9.5 GB |
Max Job time | 2-00:00:00 | |
Cost | 32 SU per GPU hour |
Jobs submitted to the gpu
partition will also be considered for routing to the volta
partition, if the resource requirements allow for it.
If you wish to specifically request V100 GPUs for your job, you can do so using the volta gres subtype. For example, to request 2 volta GPUs, place the following line in your job script:
#SBATCH --gres=gpu:volta:2
The test
and skytest
partitions[edit | edit source]
These partitions are meant for job testing. Jobs submitted to these partitions have elevated priority to ensure a quick start time. There are strict limits on the wall time and the number of cpus a test job can run on. It is also three times more expensive to run through the job testing partitions. Furthermore, users with a fair share factor of less than 0.25 cannot run using these queues until their fair share rises above the threshold.
test partition limits[edit | edit source]
The test
partition targets Phoenix nodes which have Haswell cores and K80 GPUs.
test job constraints | |
Max Job time | 02:00:00 |
Max RAM per cpu | 4 GB |
Max RAM per node | 64 GB |
Cost | 3 SU per CPU hour, 24 SU per GPU hour |
Max number of cpus | 16 |
Max number of GPUs | 4 |
Can use GPUs | Yes |
skytest partition limits[edit | edit source]
The skytest
partition targets Phoenix nodes which have Skylake cores and V100 GPUs.
skytest job constraints | |
Max Job time | 02:00:00 |
Max RAM per cpu | 9665 MB |
Max RAM per node | 190 GB |
Cost | 3.75 SU per CPU hour, 96 SU per GPU hour |
Max number of cpus | 20 |
Max number of GPUs | 2 |
Can use GPUs | Yes |
The highmem
partition[edit | edit source]
This partition is meant for jobs that have high memory requirements. The Phoenix cluster currently holds 3 nodes with 512 GB of RAM and 3 nodes with 1.5 TB of RAM. Demand for these nodes can fluctuate. Make sure to check your memory requirements thoroughly before submitting to the highmem partition to avoid unnecessary queuing times.
highmem partition limits[edit | edit source]
The highmem
partition targets Phoenix nodes which have Haswell cores and a minimum of 512 GB of RAM.
highmem job constraints | |
Max Job time | 3-00:00:00 |
Max RAM per cpu | 16 GB or 48 GB |
Max RAM per node | 503 GB or 1511 GB |
Cost | 1 SU per CPU hour |
Can use GPUs | No |