Slurm memory limits#
Slurm imposes a memory limit on each job. By default, it is deliberately relatively small — 2 GB per node. If your job uses more than that, you’ll get an error that your job Exceeded job memory limit.
To set a larger limit, add to your job submission:
where X is the maximum amount of memory your job will use per node, in MB.
The larger your working data set, the larger this needs to be, but the smaller the number the easier it is for the scheduler to find a place to run your job.
To determine an appropriate value look at how to study job efficiency section.
How to study job efficiency#
Many times we ask ourselves how I figure out how efficient my job is and for this study you can use the seff command.
where JOBID is the one you’re interested in. For example:
$ seff 1553
Job ID: 1553
Cluster: teide
User/Group: viddata/viddata
State: COMPLETED (exit code 0)
Nodes: 4
Cores per node: 16
CPU Utilized: 2-11:55:56
CPU Efficiency: 89.73% of 2-18:47:28 core-walltime
Job Wall-clock time: 01:02:37
Memory Utilized: 26.08 GB (estimated maximum)
Memory Efficiency: 23.85% of 109.38 GB (27.34 GB/node)
Request memory little large than seff reports.
You should set the memory you request to something a little larger than what seff reports.
seff show the maximum memory used in parallel jobs.
Note that for parallel jobs spanning multiple nodes, this is the maximum memory used on any one node; if you’re not setting an even distribution of tasks per node (e.g. with --ntasks-per-node), the same job could have very different values when run at different times.
Wait for job to finish succesfully
Also note that the number recorded by slurm for memory usage will be inaccurate if the job terminated due to being out of memory. To get an accurate measurement you must have a job that completes successfully as then slurm will record the true memory peak.
Max memory per type of node#
Node type | Slurm memory max request | |
---|---|---|
Sandy bridge 16 Cores - 32 GB | 30000 MB | |
Sandy bridge 16 Cores - 64 GB | 62000 MB | |
Fat Nodes 32 Cores - 256 GB | 254000 MB | upon request |
Icelake Nodos GPU 64 Cores - 256 GB | 254000 MB | upon request |
How do I figure out how efficient my job is?#
You can see your job efficiency by comparing MaxRSS, MaxVMSize, Elapsed, CPUTime, NCPUS with sacct command.
sacct -j 999997 --format=User,JobID,Jobname%25,partition,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist%20
User JobID JobName Partition Elapsed MaxRSS MaxVMSize NNodes NCPUS NodeList
--------- ------------ ----------------- ---------- ---------- ---------- ---------- -------- ---------- --------------------
eolicase 999996 WRF_2023080618 batch 13:32:47 4 64 node1511-[1-4]
999996.batch batch 13:32:47 216700K 815628K 1 16 node1511-1
999996.0 geogrid.exe 00:00:57 236240K 2762564K 4 8 node1511-[1-4]
999996.1 metgrid.exe 00:01:54 15395M 591912K 4 4 node1511-[1-4]
999996.2 real.exe 00:00:26 338148K 2928876K 4 16 node1511-[1-4]
999996.3 wrf.exe 13:26:36 598864K 3768748K 4 64 node1511-[1-4]
In this job, you see that the user used 64 cores and his job ran for 13.5 hours. However, your CPU time is 13.5 hours, which is close to 64*13 hours. If your code scales effectively according to this formula CPUTime = NCPUS * Elapsed your application scales perfectly. If it isn't, the result will diverge from the formula and the best way to test this and determine how to scale your app is to do some scaling testing.
There are two styles you can do: Strong scaling and Weak scaling
Strong scaling#
Strong scaling is where you leave the problem size the same but increase the number of cores. If your code scales well it should take less time proportional to the number of cores you use.
Weak scaling#
The amount of work per core remains the same but you increase the number of cores, so the size of the job scales proportionally to the number of cores. Thus if your code scales in this case the run time should remain the same.
Typically most codes have a point where the scaling breaks down due to inefficiencies in the code.
Thus beyond that point there is not any benefit to increasing the number of cores you throw at the problem. That’s the point you want to look for. This is most easily seen by plotting log of the number of cores vs. log of the runtime.