Submitting Batch Jobs

SLURM

Elja uses SLURM as the batch scheduler and resource manager. Basic common commands are summarized below.

Command	Description
`sbatch`	submit a batch job script
`srun`	run a parallel job
`squeue` (-a, -u $USER)	show queue status
`sinfo`	view info about nodes and partitions
`scancel` JOBID	cancel a job

Fairshare

The Cluster provides the Slurm Fairshare Algorithm, which organizes job execution in the queue based on a fairshare factor. This factor is a floating-point value between 0.0 and 1 calculated by an equation considering factors like the number of requested nodes. More details about this equation can be found here and further information about Fairshare is available on the Slurm official website here and here.

Job Array

When a user requests many Slurm jobs performing the same process with different parameters, it can occupy nodes and limit access for other users. Job Arrays allow you to submit and manage a collection of similar jobs efficiently. These jobs can be submitted quickly, provided they share the same options.

To implement this, add the following line to your sbatch script: #SBATCH --array=... #example --array=1-5. Also add $SLURM_ARRAY_TASK_ID as a parameter to the program you want to run:

mpirun python job.py $SLURM_ARRAY_TASK_ID

Information on creating a batch submit script can be found in this chapter.

Batch Jobs

The command sbatch is used to submit jobs to the SLURM queue:

[..]$ sbatch submit_script

A batch submit script usually starts like this:

#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<Your E-mail> # for example uname@hi.is
#SBATCH --partition=48cpu_192mem # request node from a specific partition
#SBATCH --nodes=2                # number of nodes
#SBATCH --ntasks-per-node=48     # 48 cores per node (96 in total)
#SBATCH --mem-per-cpu=3900       # MB RAM per cpu core
#SBATCH --time=0-04:00:00        # run for 4 hours maximum (DD-HH:MM:SS)
#SBATCH --hint=nomultithread     # Suppress multithread
#SBATCH --output=slurm_job_output.log   
#SBATCH --error=slurm_job_errors.log  # Logs if job crashes
. ~/.program_env_bash
mpirun python job.py

This example requests two nodes from the 48cpu_192mem partition, using 48 processors per node for a total of 96 processors. The memory per cpu-core is set to 3900MB RAM. See the Partitions & Hardware for details on the available partitions.

When the SLURM scheduler has allocated the resources, the subsequent lines are executed in order. First a program environment bash is loaded (see Program Environment), and an mpirun instance of a Python script is executed.

Hyper-threading of the intel based CPUs is on by default, hence it is highly recommended to suppress it in your submit (or .bashrc) script (unless your software supports it and is correctly compiled with openmp).

For .basrhc

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1

After submitting a job, you can view its current status and job ID like this:

[..]$ squeue -u $USER
JOBID PARTITION    NAME   USER   ST TIME  NODES NODELIST(REASON)
11729 48cpu_192 Interact  <uname> R  2:10     1 compute-17

You can cancel a job using the JOBID number. For example:

[..]$ scancel 11729

If your job requires a lot of input data or generates a lot of output, it's advisable to make use of the /scratch/ disk available on the compute nodes. See the next section.

SLURM​

Fairshare​

Job Array​

Batch Jobs​

SLURM

Fairshare

Job Array

Batch Jobs