1. Home
  2. Sonic Cluster
  3. Queuing System & Scheduler

Queuing System & Scheduler

Job Scheduler(SLURM):

Slurm (Simple Linux Utility for Resource Management) is an open-source job scheduler that allocates compute resources on clusters for queued researcher defined jobs. Slurm has been deployed at various national and international computing centers, and by approximately 60% of the TOP500 supercomputers in the world.

You can learn more about SLURM and its commands from the official Slurm website.

Queuing System:

When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need.

Slurm partitions are essentially different queues that point to collections of nodes. On Sonic  there is one partition:

  • long:¬†this partition has 14 compute nodes that have been set aside for running the longer jobs. This partition has no time limits.
Queue nameNo. of NodesNode listDefault walltime
Total No. of CPUs (threads)
long14sonic[1-14]no limit(inf)1344
NOTE: devel is the default partition
Sonic cluster doesn't have a "devel" partition and all nodes are in single partition "long"

Useful commands

Slurm CommandDescriptionSyntex
sbatchSubmit a batch serial or parallel job using slurm submit scriptsbatch slurm_submit_script.sub
srunRun a script or application interactivelysrun --pty -p test -t 10 --mem 1000 /bin/bash [script or app]
scancelKill a job by job id numberscancel 999999
squeueView status of your jobssqueue -u OR squeue -l
sinfoView the cluster nodes, partitions and node status informationsinfo OR sinfo -lNe
sacctCheck current job by id numbersacct -j 999999

Usage Guidelines

  • Users are supposed to submit jobs only through the scheduler.
  • Users are not supposed to run any job on the master node.
  • Users are not allowed to run a job by direct login to any compute node.
Was this article helpful to you? Yes No