1. Home
  2. Contra Cluster
  3. Queuing Systems & Scheduler

Queuing Systems & Scheduler

Job Scheduler(SLURM):

Slurm (Simple Linux Utility for Resource Management) is an open-source job scheduler that allocates compute resources on clusters for queued researcher-defined jobs. Slurm has been deployed at various national and international computing centers, and by approximately 60% of the TOP500 supercomputers in the world.

You can learn more about SLURM and its commands from the official Slurm website.

Queuing System:

When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need.

Slurm partitions are essentially different queues that point to collections of nodes. On Mario there are three partitions:

  • post-proc: This partition has two compute node that has been set aside for post-processing and test jobs. This partition has a maximum time limit of 1 hour.
  • parallel-short: This partition has 11 compute nodes that have been set aside for running only parallel smaller jobs. This queue/partition has a maximum time limit of 12 hours.
  • serial-short: This partition has 8 compute nodes that have been set aside for running only longer jobs. This partition has 240 hours.
  • serial-long: This partition has 7 compute nodes that have been set aside for running only serial long jobs. This partition has no time limits.
  • parallel-long: This partition has 36 compute nodes that have been set aside for running only parallel longer jobs. This partition has 96 hours.
Queue nameNo. of NodesNode listDefault walltime
(day-hrs:min)
Total No. Actual CPUsTotal No. of CPUs with Hyper-ThreadingAbout queue
post-proc2cn [1-2]1:00:00641281hr time limit, this will be for post processing, exclusively for parallel jobs
parallel-short11cn[3-13]12:00:0035270412 hrs time limit, exclusively for parallel jobs
serial-short8cn[14-21]10-00:00:0256512240 hrs time limit, exclusively for serial jobs
serial-long7cn[22-28]no-limit224448with no time limit, exclusively for serial jobs
parallel-long36cn[29-64]4-00:00:001152230496 hrs time limit, exclusively for parallel jobs
NOTE: devel is the default partition

Useful commands

Slurm CommandDescriptionSyntex
sbatchSubmit a batch serial or parallel job using slurm submit scriptsbatch slurm_submit_script.sub
srunRun a script or application interactivelysrun --pty -p test -t 10 --mem 1000 /bin/bash [script or app]
scancelKill a job by job id numberscancel 999999
squeueView status of your jobssqueue -u OR squeue -l
sinfoView the cluster nodes, partitions and node status informationsinfo OR sinfo -lNe
sacctCheck current job by id numbersacct -j 999999

Usage Guidelines

  • Users are supposed to submit jobs only through the scheduler.
  • Users are not supposed to run any job on the master node.
  • Users are not allowed to run a job by direct login to any compute node.
Was this article helpful to you? Yes 1 No 1