Phoenix Partitions and QoSs

版权声明:欢迎转载!如果不喜欢请留言说明原因再踩!谢谢,我也可以知道原因,不断进步!! https://blog.csdn.net/Scythe666/article/details/82926011

The Phoenix cluster nodes are grouped into different sets (partitions) each fulfilling different purposes. These partitions can be considered to be separate "job queues", each of which can have a variety of sets of constraints on certain resources (nodes, processors, memory, time, etc.), so-called quality of service (QoS) constraints, associated with it. The Phoenix cluster runs the SLURM scheduler to manage all partitions.

The following will help you select the right partition for your job.

Contents

 [hide

How to submit to a partition (queue)[edit | edit source]

There are two ways to tell SLURM which partition your job should run on. The first is through a direct command when submitting a job, i.e.:

sbatch -p <partition> jobscript

The second, more common, way is to define it in your jobscript via:

#SBATCH -p <partition> 

Partitions[edit | edit source]

The Phoenix cluster currently supports a number of different major SLURM partitions:

  • batch
  • cpu
  • skylake
  • skypool
  • gpu
  • volta
  • test
  • skytest
  • highmem

The batch partition[edit | edit source]

This is the default partition, and in the majority of cases you should submit your job to the batch partition. When the batch partition is specified, the scheduler will use an internal algorithm will determine which partition your job is eligible for, and it will be re-routed to all eligible partitions. The following partitions are considered for job eligibility:

  • cpu, skylake, skypool for CPU-only jobs
  • gpu, volta for GPU jobs

The majority of jobs will be eligible for multiple partitions, and will run in the first partition with available resources that meet the job requirements.

The cpuskylake and skypool partitions[edit | edit source]

These partitions are meant for general computational jobs that do not require any GPU accelerators and a moderate amount of RAM. Most of the jobs submitted to Phoenix will run in these partitions.

cpu partition limits[edit | edit source]

The cpu partition targets Phoenix nodes that have the Haswell Intel CPU architecture. It is also the partition with most resources (i.e. nodes) associated with it.

cpu job constraints
Max Job time 3-00:00:00
Max cpus/node 32
Max RAM per cpu 4 GB
Max RAM per node 125 GB
Cost 1 SU per CPU hour
Can use GPUs No

skylake partition limits[edit | edit source]

The skylake partition targets Phoenix nodes that have the Skylake Intel CPU architecture. This partition is suitable for general computational jobs that do not require any GPU accelerators, and a moderate to large amount of RAM. The Skylake nodes have up to 384 GB of memory available, which is triple the amount available to the Haswell nodes.

skylake job constraints
Max Job time 3-00:00:00
Max cpus/node 40
Max RAM per cpu 9665 MB
Max RAM per node 377 GB
Cost 1.25 SU per CPU hour
Can use GPUs No

skypool partition limits[edit | edit source]

Jobs that are eligible to run on the skylake partition will also be considered for the skypool partition. The purpose of the skypool partition is to maximise CPU utilisation across the Skylake nodes by allowing CPU-only jobs that have a sufficiently small core-per-node requirement to run alongside GPU jobs in the volta partition.

skypool job constraints
Max Job time 3-00:00:00
Max cpus/node 32
Max RAM per cpu 9665 MB
Max RAM per node 301 GB
Cost 1.25 SU per CPU hour
Can use GPUs No

Long QoS[edit | edit source]

If for some reason it is not possible to run your job within three days by e.g. increasing the number of CPUs/RAM or by check-pointing (saving the current state and restarting it), we offer a long QoS upon special request. Each request will be assessed on an individual basis. Accessing the long QoS requires a fair-share factor of greater than 0.25.

If you have been granted access to the long QoS, you can select it by adding the following to your job submission script:

#SBATCH --qos=long

The long QoS features are the following:

long QoS constraints
Max Job time 7-00:00:00

The gpu and volta partitions[edit | edit source]

These partitions are suitable for computational jobs which require GPU accelerators. Programmes that can run efficiently on GPUs can see speed-ups of 10X or better. (1 CPU hour vs. 1 GPU hour).

gpu partition limits[edit | edit source]

The gpu partition targets the Phoenix nodes that have Nvidia K80 accelerators. For the Nvidia K80s, the cost for one GPU-hour is 8 service units (SU). The minimum number of cores per GPU is 2, and the maximum is 4. The K80 nodes have 2 x K80 cards, and as each K80 is a dual-GPU device, there is a maximum of 4 GPUs that can be used per node.

gpu job constraints
Max Job time 2-00:00:00
Max cpus/node 16
Max RAM per cpu 4 GB
Max RAM per node 64 GB
Cost 8 SU per GPU hour
Can use GPUs Yes


The resources available for jobs that are assigned to the gpu partition depend on the number of GPUs requested. The scheduler controls these limits by automatically assigning each job one of three QoS flags, named gxs, gxm, gxl. These flags are designed to optimise the overall system utilisation, with each providing different core and memory limits based on the number of GPUs required. The following table summarises the constrains.

QoS gxs gxm gxl
GPUs/node 1 2 3 or 4
Max cpus/node requestable 4 8 16
Max RAM/node requestable 16 GB 32 GB 64 GB
Max RAM per cpu 4 GB 4 GB 4 GB
Max Job time 2-00:00:00
Cost 8 SU per GPU hour

Jobs submitted to the gpu partition will also be considered for routing to the volta partition, if the resource requirements allow for it.

If you wish to specifically request K80 GPUs for your job, you can do so using the kepler gres subtype. For example, to request 2 kepler GPUs, place the following line in your job script:

#SBATCH --gres=gpu:kepler:2

volta partition limits[edit | edit source]

The volta partition targets the Phoenix nodes that have Nvidia V100 accelerators. This partition is suitable for all computational jobs which require up to 2 GPU accelerators per node. In particular, the Nvidia V100 cards have a number of tensor cores that are optimised for deep learning workloads.

For the Nvidia V100s, the cost for one GPU-hour is 32 service units (SU). The minimum number of cores per GPU is 2, and the maximum is 8. There is a maximum of 2 V100 GPUs that can be used per node.

volta job constraints
Max Job time 2-00:00:00
Max cpus/node 16
Max RAM per cpu 9665 MB
Max RAM per node 151 GB
Cost 32 SU per GPU hour
Can use GPUs Yes


The resources available for jobs that are assigned to the volta partition depend on the number of GPUs requested. The scheduler controls these limits by automatically assigning each job one of two QoS flags, named vxs, vxm. These flags are designed to optimise the overall system utilisation, with each providing different core and memory limits based on the number of GPUs required. The following table summarises the constrains.

QoS vxs vxm
GPUs/node 1 2
Max cpus/node requestable 8 16
Max RAM/node requestable 76 GB 152 GB
Max RAM per cpu 9.5 GB 9.5 GB
Max Job time 2-00:00:00
Cost 32 SU per GPU hour

Jobs submitted to the gpu partition will also be considered for routing to the volta partition, if the resource requirements allow for it.

If you wish to specifically request V100 GPUs for your job, you can do so using the volta gres subtype. For example, to request 2 volta GPUs, place the following line in your job script:

#SBATCH --gres=gpu:volta:2

The test and skytest partitions[edit | edit source]

These partitions are meant for job testing. Jobs submitted to these partitions have elevated priority to ensure a quick start time. There are strict limits on the wall time and the number of cpus a test job can run on. It is also three times more expensive to run through the job testing partitions. Furthermore, users with a fair share factor of less than 0.25 cannot run using these queues until their fair share rises above the threshold.

test partition limits[edit | edit source]

The test partition targets Phoenix nodes which have Haswell cores and K80 GPUs.

test job constraints
Max Job time 02:00:00
Max RAM per cpu 4 GB
Max RAM per node 64 GB
Cost 3 SU per CPU hour, 24 SU per GPU hour
Max number of cpus 16
Max number of GPUs 4
Can use GPUs Yes

skytest partition limits[edit | edit source]

The skytest partition targets Phoenix nodes which have Skylake cores and V100 GPUs.

skytest job constraints
Max Job time 02:00:00
Max RAM per cpu 9665 MB
Max RAM per node 190 GB
Cost 3.75 SU per CPU hour, 96 SU per GPU hour
Max number of cpus 20
Max number of GPUs 2
Can use GPUs Yes

The highmem partition[edit | edit source]

This partition is meant for jobs that have high memory requirements. The Phoenix cluster currently holds 3 nodes with 512 GB of RAM and 3 nodes with 1.5 TB of RAM. Demand for these nodes can fluctuate. Make sure to check your memory requirements thoroughly before submitting to the highmem partition to avoid unnecessary queuing times.

highmem partition limits[edit | edit source]

The highmem partition targets Phoenix nodes which have Haswell cores and a minimum of 512 GB of RAM.

highmem job constraints
Max Job time 3-00:00:00
Max RAM per cpu 16 GB or 48 GB
Max RAM per node 503 GB or 1511 GB
Cost 1 SU per CPU hour
Can use GPUs No

猜你喜欢

转载自blog.csdn.net/Scythe666/article/details/82926011