GPU computing at TeideHPC#
The TeideHPC cluster has a number of nodes that have NVIDIA general purpose graphics processing units (GPGPU) attached to them. It is possible to use CUDA tools to run computational work on them and in some use cases see very significant speedups.
As we explained in the how to run jobs section we can use 3 differents ways for sending a job to the job queue: using an interactive session, launching the application in real time or by means of an execution script.
GPUs on Slurm#
To request a single GPU on slurm just add #SBATCH --gres=gpu
to your submission script and it will give you access to a GPU. To request multiple GPUs add #SBATCH --gres=gpu:n
where ‘n’ is the number of GPUs.
So if you want 1 CPU and 2 GPUs from our general use GPU nodes in the ‘gpu’ partition, you would specify:
If you prefer to use interactive session you can use:
While on GPU node, you can run nvidia-smi to get information about the assigned GPU’s.
Specifying GPU Type or MIG partition to use.#
The GPU models currently available on our cluster can be found here but as we explain in MIG section we can specify gpu type or MIG partition to use. There are two methods that can be used.
Visit request GPU and compute resources page for a detailed explanation:
Select GPU using --constraint= #
Select GPU using --gres=gpu:model:1 or --gres=gpu:mig-partition:1#
Note that --gres specifies the resources on a per node basis, so for multinode work you only need to specify how many gpus you need per node.
List of model of GPUs and partitions#
To find out what specific types of gpu’s are available on a partition run scontrol show partition
scontrol show partition express
PartitionName=express
...
MaxNodes=UNLIMITED MaxTime=03:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=node0303-2,node0304-[1-4],node1301-[1-4],node1302-[1-4],node1303-[1-4],
....
State=UP TotalCPUs=2424 TotalNodes=88 SelectTypeParameters=NONE
...
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=2424,mem=7565306M,node=88,billing=2424,gres/gpu=79,gres/gpu:1g.5gb=2,gres/gpu:2g.10gb=1,gres/gpu:3g.20gb=1,gres/gpu:a100=71,gres/gpu:t4=4
Nvidia A100
gpu:a100
a100-mig
- gpu:1g.5gb
- gpu:2g.10gb
- gpu:3g.20gb
- ...
Nvidia Tesla T4
- gpu:t4
Job script example for GPU job:#
- Full GPU Nvidia A100
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:a100:1
#SBATCH --mem=8G
#SBATCH --time=1:00:00
module purge
module load CUDA/12.0.0
nvidia-smi
sleep 20
- MIG partition in Nvidia A100
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:2g.10gb:1
#SBATCH --mem=16G
#SBATCH --time=1:00:00
module purge
module load NVHPC/22.11-CUDA-11.8.0
nvidia-smi
sleep 20
More examples.#
Visit our repository in github https://github.com/hpciter/user_codes