Submitting MPI jobs (multi-process) – Information Technology Services

How do you want the processes to be distributed?

All are on same node to reduces the network latencies:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are all grouped on same node"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v2.out
#SBATCH --error=mpi_mm_v2.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=32
# Number of nodes
#SBATCH --nodes=1
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Scatter distribution of jobs to increase overall memory bandwidth:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are scattered across distinct nodes"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v3.out
#SBATCH --error=mpi_mm_v3.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Even distribution of processes across nodes:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are evenly distributed across nodes"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v1.out
#SBATCH --error=mpi_mm_v1.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=32
# Process distribution per node
#SBATCH --ntasks-per-node=8
# Number of nodes
#SBATCH --nodes=4
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Let scheduler choose:

sample script – sample.sub

#!/bin/bash
# Submission script: "no plan"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v4.out
#SBATCH --error=mpi_mm_v4.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=64
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Submit job:

sbatch sample.sub

The job’s status in the queue can be monitored with squeue; (add -u username to focus on a particular user’s jobs).

The job can be deleted with scancel <job_id> .

When the job finishes (in error or correctly) there will normally be one file created in the submission directory with the name of the form slurm-NNNN.out (where NNNN is the job id).

Submit script flags

Resource	Flag Syntax	Description	Notes
job name	-J, --job-name=hello_test	Name of job	default is the JobID
partition	-p, --partition=devel	Partition is a queue for jobs	default partition maked with *, devel is the default partition on Mario
time	-t, --time=01:00:00	Time limit for the job. Acceptable time formats include minutes, minutes:seconds, hours:minutes:seconds, days-hours, days-hours:minutes and days-hours:minutes:seconds	here it is given as 1 hour
nodes	-N, --nodes=2	Number of compute nodes for the job	default is 1 compute node
number tasks	-n, --ntasks=1	A maximum of number tasks and to provide for sufficient resources.	default is 1 task per node
ntasks on each node	--ntasks-per-node=8	Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node	default is 1 task per node
memory	--mem=32000	Memory limit per compute node for the job. Do not use with mem-per-cpu flag	by default memory in MB
memory per CPU	--mem-per-cpu=1000	per core memory limit. Do not use with mem flag	by default memory in MB
output file	-o, --output=test.out	Name of file for stdout	default is the JobID
error file	-e, --error=test.err	Name of file for stderr	default is the JobID
email address	--mail-user=username@buffalo.edu	User's email address	send email on submition and complition of job OR omit for no email
email notification	--mail-type=ALL –mail-type=END	When email is sent to user.	omit for no email

Tetris Cluster

How do you want the processes to be distributed?

All are on same node to reduces the network latencies:

Scatter distribution of jobs to increase overall memory bandwidth:

Even distribution of processes across nodes:

Let scheduler choose:

Submit job:

Submit script flags