You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

Introduction

This user's guide for the CMS is intended to provide the minimum amount of information needed by a new user of this system. As such, it assumes that the user is familiar with basic notions on scientific computing, in particular the basic commands of the Unix operating system, and also with basic techniques for the execution of applications in a supercomputer.

The information in this guide includes basic technical documentation about the CMS, the software environment, and also on available applications.

Please read it carefully and if any doubt arises don't hesitate to contact our support mail (see below in Getting help).


Connecting to GridUI

View Cluster usage guidelines


SLURM Concept

SLURM manages user jobs with the followingkey characteristics:

  • set ofrequested resources:
    • number of computing resources: nodes(including all their CPUsand cores) or CPUs(including all their cores) or cores
    • amount ofmemory: either per node or per (logical) CPU
    • the (wall)time needed for the user’s tasks to complete their work
  • a set ofconstraints limiting jobs to nodes with specific features
  • a requested node partition(job queue)
  • a requested quality of service(QoS) level which grants users specific accesses
  • a requested account for accounting purposes

Running Jobs

As is defined above SLURM is the utility used at CMS for batch processing support, so all jobs must be run through it. This part provides information for getting started with job execution at CMS, (see the official  slurm documentation to get more information on how to create a job) .


NOTE: In order to keep the login nodes in a proper load, a 10 minutes limitation in the CPU time is set for processes running interactively in these nodes. Any execution taking more than this limit should be carried out through the queue system.

Manage Jobs

A job is the execution unit for the SLURM. A job is defined by a script containing a set of directives describing the job, and the commands to execute.

These are the basic directives to submit jobs:

  • sbatch <job script>: submits a job script to the queue system (see below for job script directives).
  • squeue  [-u user ]: shows all the jobs submitted by all users or by the user if you specify the -u option.
  • scancel <job id>: removes his/her job from the queue system, canceling the execution of the job if it was already running.
  • srun: submit interactive job, run (parallel) job step
  • squeue: view queued jobs
  • sinfo: view partition and node info
  • scontrol: detailed control and info on jobs, queues, partitions
  • sstat: view system-level utilization (memory, I/O, energy)֒→for running jobs / job steps
  • sacct: view system-level utilization֒→for completed jobs / job steps (accounting DB)
  • sprio: view job priority factors
  • sshare: view accounting share info (usage, fair-share, etc.)

Job directives

A job must contain a series of directives to inform the batch system about the characteristics of the job and you can configure them to fit your needs. These directives appear as comments in the job script and  usually at the top just after the shebang line, with the following syntax:

#SBATCH option=value

Note that these directives:

  • start with the #SBATCH prefix
  • are always lowercase
  • have no spaces in between.
  • don't expand shell variables (they are just shell comments)

This table describes the common directives you can define in your job (see below an example):

--job-name=value
The name of the job that appears in the batch queuescript_name
--partition=...
The name of the queue in slurmcompute
--output=...
The name of the file to collect the standard output of the job.  The %j part in the job directives will be sustitute by the job ID.file-%j.out
--error=...
The name of the file to collect the standard error of the job. The %j part in the job directives will be sustitute by the job ID.file-%j.err
--chdir=...
The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted.submitting directory
--qos=...
Quality of Service (or queue) where the job is allocated. By default, a queue is assigned for the user so this variable is not mandatory.main
--time=...

The limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period.
The format can be: m, m:s, h:m:s, d-h, d-h:m or d-h:m:s

qos default time limit
--ntasks=...

The number of processes to allocate as parallel tasks.

1
--cpus-per-task=...
Number of processors for each task. Without this option, the controller will just try to allocate one processor per task.  The number of cpus per task must be between 1 and 16, since each node has 16 cores (one for each thread).1
--ntasks-per-node
The number of tasks allocated in each node. When an application uses more than 3.8 GB of memory per process, it is not possible to have 16 processes in the same node and its 64GB of memory. It can be combined with the cpus_per_task to allocate the nodes exclusively, i.e. to allocate 2, processes per node, set both directives to 2. The number of tasks per node must be between 1 and 16.1
--mem-per-cpu
Minimum memory required per allocated CPU. Default units are megabytes unless the SchedulerParameters configuration parameter includes the "default_gbytes" option for gigabytes.DefMemPerCPU

Job environment variables

There are also a few SLURM environment variables you can use in your scripts:

SLURM_JOBID

Specifies the job ID of the executing job

SLURM_NPROCS

Specifies the total number of processes in the job. Same as -n, --ntasks

SLURM_NNODES

Is the actual number of nodes assigned to run your job

SLURM_PROCID

Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-(SLURM_NPROCS-1)

SLURM_NODEID

Specifies relative node ID of the current job. The range is from 0-(SLURM_NNODES-1)

SLURM_LOCALID

Specifies the node-local task ID for the process within a job

SLURM_NODELIST

Specifies the list of nodes on which the job is actually running

SLURM_SUBMIT_DIRThe directory from which sbatch was invoked.

SLURM_MEM_PER_CPU

Memory available per CPU used

Interactive batch jobs

In order to launch an interactive session on a compute node do:

srun --time=1:00:00 --ntasks 2 --nodes=1 --account=chpc --partition=ember --pty /bin/tcsh -l

The important flags are the --pty denoting an interactive terminal, and /bin/tcsh -l, which is the shell to run. If you prefer bash, replace /bin/tcsh with /bin/bash. As in submitting a batch script, the -n specifies tasks and the -N specifies nodes. The -l (or --label) will prepend task number to lines of stdout/err. The --label option will prepend lines of output with the remote task id. Note that the order of the command is important, the "--pty /bin/tcsh -l" has to be at the end.

The srun flags can be abbreviated as:

srun -t 1:00:00 -n 2 -N 1 -A chpc -p ember --pty /bin/tcsh -l

The srun command by default passes all environment variables of the parent shell therefore the X window connection gets preserved as well, allowing for running graphical applications such as GUI based programs inside the interactive job.

Cluster queues tend to be very busy; it may take some time for an interactive job to start. For this reason, they have added two nodes in a special partition on the notchpeak cluster that are geared more towards interactive work. Job limits on this partition are 8 hours wall time,  a maximum ten submitted jobs per user, with a maximum of two running jobs with a maximum total of 32 tasks and 128 GB memory.  To access this special partition, called notchpeak-shared-short, request both an account and partition under this name, e.g.:

srun -N 1 -n 2 -t 2:00:00 -A notchpeak-shared-short -p notchpeak-shared-short --pty /bin/bash -l

Job examples

Example for a sequential job:

#!/bin/bash
#
#SBATCH --job-name=hello
#SBATCH --output=hello.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
# From here the job starts
echo hostname
sleep 60

Example for a parallel job:

#!/bin/bash
#
#SBATCH --job-name=hello
#SBATCH --output=hello.out
#SBATCH --ntasks=4 # The job spawns in 4 cores
#SBATCH --time=10:00
# From here the job starts
srun parallel.sh
srun sleep 60

Example for an interactive job:

srun -p interactive --qos qos-interactive --time=0:30 -N2 --ntasks-per-node=4 --pty bash -isalloc -p interactive --qos qos-interactive bash -cssh -Y $(scontrol show hostnames | head-n 1)srun -p interactive --qos qos-besteffort --cpu-bind=none -N1 -n4 --pty bash -isrun -C skylake -p batch --time=0:10:0 -N1 -c28 --pty bash -i

Example for a batch job:

sbatch job.sh

sbatch -N 2 job.sh

sbatch -p batch --qos qos-batch job.sh

sbatch -p long --qos qos-long job.sh

sbatch --begin=2019-11-23T07:30:00 job.sh

sbatch -p batch --qos qos-besteffort job.sh

sbatch -p bigmem --qos qos-bigmem --mem=2T job.sh

Example for status and details for partitions, nodes, reservations:

squeue / squeue -l / squeue -la / squeue -l -p batch / squeue -t PD

scontrol show nodes / scontrol show nodes $nodename

sinfo / sinfo -s / sinfo -N

sinfo -T

Getting Help

IFCA provides to users consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 17 p.m. (CEST time).

User questions and support are handled at: <grid.support@ifca.unican.es>

If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output or log files.











  • No labels