.. _Antwerp Slurm: Slurm @ UAntwerp ================ This page covers the more basic Slurm use, including starting jobs, basic job management and some templates for job scripts for various scenarios. It is the minimum a user should master. A second page describes :ref:`more advanced use of Slurm `. What and why? ------------- Since the start of the VSC, Torque and Moab were used as the resource manager and scheduler respectively. The resource manager is responsible for keeping track of resources and making sure jobs use the resources allocated to them. The scheduler is the piece of software that prioritises jobs that are waiting in the queue and decides which job can start with which resources. It is clear that both have to work together very closely. Torque and Moab were developed and supported by Adaptive Computing. This company was acquired by ALA Services Technology Companies. Since then the software isn't well supported anymore, resulting in problems to keep it running on our systems. Therefore, the decision was taken to transfer to a different resource manager and scheduler software. Slurm Workload Manager was chosen due to its wide use in academic supercomputer centres. We've been preparing for this switch for over two years now by stressing in the introductory courses those features of Torque and Moab that resemble Slurm features the most. Slurm Workload Manager is also used on the clusters at UGent (but with a wrapper that still accepts Torque job scripts with some limitations) and will also be the scheduler on Hortense, the successor of the BrENIAC Tier-1 system. Historically, Slurm was an acronym of **S**\imple **L**\inux **U**\tility for **R**\esource **M**\anagement. The development started around 2002 at Lawrence Livermore National Lab as a resource manager for Linux clusters. Slurm has always had a very modular architecture. From 2008 on increasingly sophisticated scheduling plugins were added to Slurm. Nowadays it is used on some of the largest systems in the world. Slurm is completely open source though commercial support can be obtained from SchedMD, a spin-off company of the Slurm development. Slurm concepts -------------- * **Nodes**: On the UAntwerp clusters (and most other clusters) a node is the largest part of the cluster running a single operating system image, and hence capable of supporting a shared memory program. Nodes are connected with each other through an interconnect, and communication between nodes is done via message passing. * **Core**: A core is a physical core in a system. * **CPU**: A CPU is a virtual core in a system, in other words, a hardware thread in a system with hyperthreading/SMT enabled. On a system with hyperthreading/SMT disabled, virtual cores are just physical cores. * **Partition**: Groups of nodes with limits and access controls, basically the equivalent of a queue in Torque. A node can be part of multiple partitions. * **Job**: A resource allocation request. * **Job step**: A set of (possibly parallel) tasks within a job. A job can consist of just a single job step or can contain multiple job steps which may use all or just a part of the resource allocation of a job and can run sequentially or in parallel (or a mix of that). The job script itself is a special job step, called the batch job step, but additional job steps can be created (e.g., for running a parallel MPI application). * **Task**: A task is executed within a job step and essentially corresponds to a Linux process: a single- or multithreaded process, or a single rank within a MPI process. Specifying the number of tasks one wants to run simultaneously and the number of cores per task is a very convenient way to request resources to Slurm as afterwards starting a MPI or hybrid MPI/OpenMP program using the ``srun`` command is very easy. Slurm commands -------------- Submitting a job script: sbatch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `Slurm sbatch manual page on the web `_ The ``sbatch`` is the command in Slurm to submit a job script. A job script first contains a list of resources and other instructions to Slurm, and this is followed by a set of commands that will be executed on the first node of the job. When the submission succeeds, ``sbatch`` will print a message containing the unique job ID for the job. Resource specifications and other instructions can be specified in three different ways: command line options, environment variables, and ``#SBATCH`` lines in the job script. * Slurm ``sbatch`` has a lot of command line options. We will only list the most important command line options below. Command line options of Slurm take precedence over environment variables and ``#SBATCH`` lines in the job script. * Some command line options can also be passed to ``sbatch`` through environment variables instead. A list of those can be found in the `sbatch manual page `_. The name of those variables starts with ``SBATCH_`` and the remaining part is derived from the matching command line option. However, be careful when using those and hiding them in ``.bashrc`` or ``.bash_profile`` as they are easily forgotten yet have a higher priority than those on ``SBATCH`` lines which is the most used mechanism to specify resources etc. * All command line options can also be passed in ``#SBATCH`` lines in the job script. These lines should follow immediately below the shebang in the first block of comment lines (lines that start with ``#``) as otherwise they will be ignored by Slurm. Note that all ``sbatch`` command line options should be specified *before* the name of the job script. All command line parameters specified *after* the name of the job script will be passed as command line arguments to the job script when it executes. Requesting compute resources """""""""""""""""""""""""""" Slurm supports several ways to request CPU cores and/or GPUs for a job. The easiest way to request CPU cores is by following the "task"-idea of Slurm and specifying the number of parallel tasks and cores per task that you need. By specifying resources this way, it is very easy afterwards to start OpenMP, MPI and hybrid MPI/OpenMP programs in the right configuration. * The number of tasks is specified by ``--ntasks=`` or ``-n ``. The ``=``-sign in the long option format can be replaced by a space, and in the short form (``-n``) the space between the flag and the value form can also be omitted (in effect, this holds for all options). * The number of CPUs (hardware threads) per task is specified by ``--cpus-per-task=`` or ``-c ``. On the UAntwerp clusters, CPUs are physical cores (since hyperthreading is disabled). For each task, all of the CPUs for that task are allocated on a single node. When using multiple nodes, the allocated CPUs for all tasks are distributed equally over all the nodes (except possibly for the last node). Make sure to request a valid combination of tasks and/or CPUs per task. Otherwise, your job can be rejected or it could end up in the partition queue but it will never start (in that case, check the reason code, as explained later in this document in the section on checking the queue). If set, the Slurm controller will set the corresponding variables, respectively ``SLURM_NTASKS`` and ``SLURM_CPUS_PER_TASK`` in the environment of the running job. If not set, the default values of 1 task and 1 CPU are used. Requesting memory """"""""""""""""" Slurm jobs can also request an amount of RAM space (resident memory). In case of the UAntwerp clusters, swapping for jobs is disabled since the nodes don't have drives suitable for the load caused by swapping and since swapping is extremely detrimental to the performance of the cluster. Therefore, swap space cannot be requested. Slurm has various ways to request memory. Unfortunately, there is currently no way to request memory per task. The preferred method for requesting memory in Slurm on the UAntwerp clusters is to specify the amount of memory per CPU (per core on the UAntwerp clusters): ``--mem-per-cpu=`` (e.g., ``--mem-per-cpu=1g``). The amount is an integer, ```` can be either ``k`` for kilobytes, ``m`` for megabyte or ``g`` for gigabyte. The job will be rejected if the final amount of memory requested cannot be satisfied. This could happen if ``--mem-per-cpu`` times the number of CPUs on a node is greater than the memory on that node that is available for job allocations. Note that on the UAntwerp clusters, the memory available for job allocations is somewhat less than the total memory installed on a node (to keep some amount of memory for the OS and file system buffers). If not set, a default value will be used, equal to the total memory available for job allocations of that node divided by the number of CPUs. The amount of available memory per CPU is available via the variable ``SLURM_MEM_PER_CPU`` as an integer with megabytes as unit in the environment of the running job. Requesting wall time """""""""""""""""""" The requested compute time is specified using ``--time=