1. Home
  2. Mario Cluster
  3. Submitting MPI jobs (multi-process)

Submitting MPI jobs (multi-process)

How do you want the processes to be distributed?

All are on same node to reduce the network latencies:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are all grouped on same node"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v2.out
#SBATCH --error=mpi_mm_v2.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=32
# Number of nodes
#SBATCH --nodes=1
# Memory per process
#SBATCH --mem-per-cpu=500
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 

date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Scatter distribution of jobs to increase overall memory bandwidth:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are scattered across distinct nodes"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v3.out
#SBATCH --error=mpi_mm_v3.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
# Memory per process
#SBATCH --mem-per-cpu=500
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Even distribution of processes across nodes:

sample script – sample.sub

#!/bin/bash
# Submission script: "tasks are evenly distributed across nodes"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v1.out
#SBATCH --error=mpi_mm_v1.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=32
# Process distribution per node
#SBATCH --ntasks-per-node=8
# Number of nodes
#SBATCH --nodes=4
# Memory per process
#SBATCH --mem-per-cpu=500
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date

Let scheduler choose:

sample script – sample.sub

#!/bin/bash
# Submission script: "no plan"

# Job name
#SBATCH --job-name=mpi_mm
# Output file name
#SBATCH --output=mpi_mm_v4.out
#SBATCH --error=mpi_mm_v4.err
#
# Set the required partition [change]
#SBATCH --partition=short
# Number of processes
#SBATCH --ntasks=64
# Memory per process
#SBATCH --mem-per-cpu=500
#
# Total wall-time
#SBATCH --time=00:05:00
#
# The below statement is required if the code is floating-point intensive and CPU-bound [Optional]
#SBATCH --threads-per-core=1
#
# To get email alert [Optional] 
# NOTE: Remove one "#" and "write your email ID" (ex: #SBATCH --mail-user=hemanta.kumar@icts.res.in)
##SBATCH --mail-user= email id
##SBATCH --mail-type=ALL 
#
date
mpirun /home/hemanta.kumar/slurm_test/mpi_mm
#srun /home/hemanta.kumar/slurm_test/mpi_mm
date
Submit job:
sbatch sample.sub

The job’s status in the queue can be monitored with squeue; (add -u username to focus on a particular user’s jobs).

The job can be deleted with scancel <job_id> .

When the job finishes (in error or correctly) there will normally be one file created in the submission directory with the name of the form slurm-NNNN.out (where NNNN is the job id).

Submit script flags

Was this article helpful to you? Yes 1 No