1. Home
  2. Sonic Cluster
  3. Submitting MPI jobs (multi-process)

Submitting MPI jobs (multi-process)

How do you want the processes to be distributed?

All are on same node to reduce the network latencies:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are all grouped on same node"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v2.out
#SBATCH --error=mpi_mm_v2.err
#
# Set the required partition [change]
#SBATCH --partition=long
# Number of processes
#SBATCH --ntasks=32
# Number of nodes
#SBATCH --nodes=1
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun --mca btl_openib_allow_ib 1 /home/hemanta.kumar/slurm_test/mpi_mm
date

Scatter distribution of jobs to increase overall memory bandwidth:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are scattered across distinct nodes"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v3.out
#SBATCH --error=mpi_mm_v3.err
#
# Set the required partition [change]
#SBATCH --partition=long
# Number of processes
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun --mca btl_openib_allow_ib 1 /home/hemanta.kumar/slurm_test/mpi_mm
date

Even distribution of processes across nodes:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are evenly distributed across nodes"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v1.out
#SBATCH --error=mpi_mm_v1.err
#
# Set the required partition [change]
#SBATCH --partition=long
# Number of processes
#SBATCH --ntasks=32
# Process distribution per node
#SBATCH --ntasks-per-node=8
# Number of nodes
#SBATCH --nodes=4
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun --mca btl_openib_allow_ib 1 /home/hemanta.kumar/slurm_test/mpi_mm
date

Let scheduler choose:

sample script – sample.sub

#!/bin/bash
# Submission script: "no plan"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v4.out
#SBATCH --error=mpi_mm_v4.err
#
# Set the required partition [change]
#SBATCH --partition=long
# Number of processes
#SBATCH --ntasks=64
# Memory per process
#SBATCH --mem-per-cpu=100
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun --mca btl_openib_allow_ib 1 /home/hemanta.kumar/slurm_test/mpi_mm
date
Submit job:
sbatch sample.sub

The job’s status in the queue can be monitored with squeue; (add -u username to focus on a particular user’s jobs).

The job can be deleted with scancel <job_id> .

When the job finishes (in error or correctly) there will normally be one file created in the submission directory with the name of the form slurm-NNNN.out (where NNNN is the job id).

Submit script flags

ResourceFlag SyntaxDescriptionNotes
job name-J, --job-name=hello_testName of jobdefault is the JobID
partition-p, --partition=develPartition is a queue for jobsdefault partition maked with *, devel is the default partition on Mario
time-t, --time=01:00:00Time limit for the job. Acceptable time formats include minutes, minutes:seconds, hours:minutes:seconds, days-hours, days-hours:minutes and days-hours:minutes:secondshere it is given as 1 hour
nodes-N, --nodes=2Number of compute nodes for the jobdefault is 1 compute node
number tasks-n, --ntasks=1A maximum of number tasks and to provide for sufficient resources.default is 1 task per node
ntasks on each node--ntasks-per-node=8Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per nodedefault is 1 task per node
memory--mem=32000Memory limit per compute node for the job. Do not use with mem-per-cpu flagby default memory in MB
memory per CPU--mem-per-cpu=1000per core memory limit. Do not use with mem flagby default memory in MB
output file-o, --output=test.outName of file for stdoutdefault is the JobID
error file-e, --error=test.errName of file for stderrdefault is the JobID
email address--mail-user=username@buffalo.eduUser's email addresssend email on submition and complition of job OR omit for no email
email notification--mail-type=ALL –mail-type=ENDWhen email is sent to user.omit for no email
Was this article helpful to you? Yes 1 No