Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

This user's guide for the High Performance Computing at IFCA is intended to provide the minimum amount of information needed by a new user of this system. As such, it assumes that the user is familiar with basic notions on scientific computing, in particular the basic commands of the Unix operating system, and also with basic techniques for the execution of applications in a supercomputer, like MPI or OpenMP.

...

Please read it carefully and if any doubt arises don't hesitate to contact our support mail (see below in Getting help). 

System Overview 

The HPC system at IFCA includes 60 wncompute nodes, 158 nodes for Altamira supercomputer, 5 gpus nodes and 2 login servers. The nodes are grouped in different partitions based on the CPU architecture family. The main features of each compute node are specified below: 

    • One group of 20 nodes with 2 Intel Xeon Platinum 8358 processors, each one with 64 cores operating at 2.6 GHz, 370 GB of RAM memory and 862 GB local disk.
    • Other group of 40 nodes with 2 AMD EPYC 7543 processors, each one with 64 cores operating at 2.8 GHz, 500 GB of RAM memory and 500 GB local disk.
    • Altamira nodes with 2 Intel Sandybridge E5-2670 processors, each one with 8 cores operating at 2.6 GHz and a cache of 20MB, 64 GB of RAM memory (i.e. 4 GB/core) and 500 GB local disk.
    • Gpu nodes with 80 cores operating at 2.1 GHz, 188 GB of RAM memory, 2 NVIDIA Tesla V100, Infiniband EDR and 100 GB local disk.

Operating System (OS)

All compute nodes and gpus are running Alma Linux 9 except Altamira nodes that are running Centos 7.9. 

Network

All the nodes includes Infiniband and Gigabit Ethenet network. Wncompute and gpus nodes with IB EDR and Altamira with IB FDR.

Storage

All the nodes are connected to a global storage system based on GPFS (Global Parallel File System) providing a total storage of 2.1 PB.

File Systems

Each node has several areas of disk space for installing system or programs and storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems. All user area is running above the GPFS filesystem which is  a distributed networked filesystem which can be accessed from all the nodes.

GPFS filesystem

The IBM General Parallel File System (GPFS) and branded as IBM Storage Scale is a high-performance shared-disk file system that can provide fast, reliable data access from all blades of the cluster to a global filesystem. GPFS allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations. These filesystems are the recommended to use with most jobs, because GPFS provides high-performance I/O by "striping" blocks of data from individual files across multiple disks on multiple storage devices and reading/writing these blocks in parallel. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.

These are the GPFS filesystems available from all nodes:

    • /gpfs/users/res → /home: users home: This filesystem has the home directories of all the users, when you log into the system you start in your home directory by default. Every user will have their own home directory to store the executables, own developed sources and their personal data. Quotas are in effect that limit the amount of data that can be saved here, a default quota will be enforced to all users.
    • /gpfs/projects: In addition to the home directory, there is a directory  for each group of users of Altamira. All the users of the same project will share their common /gpfs/projects space and it is responsibility of each project manager to determine and coordinate the better usage of this space, and how it is distributed or shared between their users. If a project needs more disk space in this filesystem or in any other of the GPFS filesystems, the project manager has to make a request for this extra space needed, specifying the space needed and the reasons why it is needed (see Getting Help section to know how contact us).
    • /gpfs/res_apps:  Over this filesystem will reside the applications and libraries that have already been installed on the global system for basic scientific usage. Take a look at the directories or go to Software section to know the applications available for general use. To use an application, you must load the module as is detailed below in the Software section.  Before installing any application that is needed by your project, first check if this application is already installed on the system. Basic scientific applications are available by default but If some application or specific version that you need is not on the system, try to install it by yourself at the home or project directory. Check Getting Help Section how to contact with us. If it is a general application with no restrictions in his use, this will be installed over a public directory, that is over /gpfs/res_apps so all users  could make use of it. All applications on /gpfs/res_apps will be installed, controlled and supervised by the user support team. This doesn't mean that the users could not help in this task, both can work together to get the best result. The user support can provide his wide experience in compiling and optimizing applications in the Altamira cluster and the users can provide his knowledge of the application to be installed. All that general applications that have been modified in some way from its normal behavior by the project users' for their own study, and may not be suitable for general use, must be installed over /gpfs/res_projects or /gpfs/res_home depending on the usage scope of the application, but not over /gpfs/res_apps
    • /gpfs/res_projects: Only for Altamira nodes. In addition to the home directory, there is a directory in /gpfs/res_projects for each group of users of Altamira. All the users of the same project will share their common /gpfs/res_projects space and it is responsibility of each project manager to determine and coordinate the better use of this space, and how it is distributed or shared between their users. If a project needs more disk space in this filesystem or in any other of the GPFS filesystems, the project manager has to make a request for this extra space needed, specifying the space needed and the reasons why it is needed (see Getting Help section to know how contact us).
    • /gpfs/res_scratch: Only for Altamira nodes. Each Altamira user will have a directory over /gpfs/res_scratch, you must use this space to store temporary files of your jobs during its execution. By default, files may reside for up to 7 days without modification in this filesystem, any older file might be removed. A quota per group will be enforced depending on the space assigned.

Batch System

The job scheduling system running at Altamira is SLURM. Current version of Slurm in Altamira is 23.02.7.

Partition

Slurm has partitions to allocate jobs. In altamiraslurm, a group of nodes with same features are assigned to a partition.  This following table resumes the main partitions and the principal properties of each of them. By default, compute is configured if --partition option is not defined in the job. Set it to res if you are coming from a RES projecta specific particion based on the user group or use the testing partition for test jobs before submitting to the production partition (compute). Test jobs in the testing queue has a limit of 8 cpus and 3 hours of time as you can see below at qos.

PartitionDescription
wncompute_ifca
10 nodes for ifca users
wncompute_meteo30 nodes for the meteo group
wncompute_astro
20 nodes for the astro group
wngpu
5 nodes for anvanced analytics computing 
computeMain partition in Altamira
resGroup of nodes for RES users
testingThis partition can be used to test new workflows and also to help new users to familiarise themselves with the SLURM batch system.  Both serial and parallel code can be tested on the testing partition.
         QoS

         Specify the Quality of Service (Qos) for each job submitted to slurm. Jobs request a QOS using the "--qos=" option to the sbatch,        salloc, and srun commands. By default,

...

GroupQueueDescriptionLimit number of nodesMax CPU cores per userMax Run TimeMax jobs per user
Ifca, astro, meteo, etcmainLocalmainMain queue for local generic users

15832

5123 days1000
testingThis queue can be used to test new workflows and also to help new users to familiarise themselves with the SLURM batch system.  Both serial and parallel code can be tested on the testing queue.48323 hours2

Res

res_aMain queue for RES users

12864

1024 (64 nodes)3 days500
res_cQueue assigned for res users when they reach the period hours limits requested464 (4 nodes)1 day10


Connecting to

...

the HPC system

Once you have a username and its associated password you can get into Altamira system, connecting to one of the the 2 login nodes at login1.ifca.es in the case of the first login or login2.ifca.es for the second one (see Login node to know more information about the login nodes). The password provided is temporal, you must change this initial password and synchronize the OTP after connecting to IPAto IPA following the instructions provided by the support team. Also use a strong password (do not use a word or phrase from a dictionary and do not use a word that can be obviously tied to your person). 

You must use Secure Shell (ssh) tools to login into or transfer file into Altamira. We do not accept incoming connections from protocols as telnet, ftp, rlogin, rcp, or rsh commands. The connection to Altamira the logins must be done in the following way:

$ ssh username@loginX.ifca.es (where X is 1 or 2)

Once you are logged into Altamira the logins you cannot make outgoing connections for security reasons (contact us in an exceptional case).

If you cannot get access to the system after following this procedure you can contact us, (see Getting Help to know how to contact with us).

Login Node

Once inside the machine you will be presented with a UNIX shell prompt and you'll normally be in your home ($HOME) directory. If you are new to UNIX, you'll have to learn the basics before you could do anything useful.

The machine in which you will be logged in will be the login node of Altamira (login1). This machine acts as front end, and it is used typically for editing, compiling, preparation/submition of batch executions and as a gateway for copying data inside or outside Altamira.

It is not permitted the execution of cpu-bound programs on this node, if some compilation needs much more cputime than the permitted, this needs to be done through the batch queue system. It is not possible to connect directly to the compute nodes from the login node, all resource allocation is done by the batch queue system.

Compute Node

As you already know Altamira includes 158 main compute nodes where all the executions must be done. For security reasons, it is not possible the connection directly to the worker nodes and all the executions must be allocated in the nodes through the batch system queue (see below how to submit jobs).

Running Jobs

As is defined above SLURM is the utility used at Altamira for batch processing support, so all jobs must be run through it. This part provides information for getting started with job execution at Altamira, (see the official  slurm documentation to get more information on how to create a job) .

Info
NOTE: In order to keep the login nodes in a proper load, a 10 minutes limitation in the CPU time is set for processes running interactively in these nodes. Any execution taking more than this limit should be carried out through the queue system.

Manage Jobs

A job is the execution unit for the SLURM. A job is defined by a script containing a set of directives describing the job, and the commands to execute.

These are the basic directives to submit jobs:

  • sbatch <job script>: submits a job script to the queue system (see below for job script directives).
  • squeue  [-u user ]: shows all the jobs submitted by all users or by the user if you specify the -u option.
  • scancel <job id>: removes his/her job from the queue system, canceling the execution of the job if it was already running.
  • scrontab [-u user] <file>: set, edit, and remove a user's Slurm-managed crontab. This file can define a number of recurring batch jobs to run on a scheduled interval. Use the same sintaxis as cron. More info at scrontab.

On the other way, you can also launch your jobs in an interactive way to run your session in one of the compute node you have requested (used eventually for graphical applications, etc) :

Running Jobs

As is defined above SLURM is the utility used at HPC for batch processing support, so all jobs must be run through it. This part provides information for getting started with job execution at Altamira, (see the official  slurm documentation to get more information on how to create a job) .

Info
NOTE: In order to keep the login nodes in a proper load, a 10 minutes limitation in the CPU time is set for processes running interactively in these nodes. Any execution taking more than this limit should be carried out through the queue system.

Manage Jobs

A job is the execution unit for the SLURM. A job is defined by a script containing a set of directives describing the job, and the commands to execute.

These are the basic directives to submit jobs:

  • sbatch <job script>: submits a job script to the queue system (see below for job script directives).
  • squeue  [-u user ]: shows all the jobs submitted by all users or by the user if you specify the -u option.
  • scancel <job id>: removes his/her job from the queue system, canceling the execution of the job if it was already running.
  • scontrol show job <job-id>: see detailed information about a specific job.
  • scrontab [-u user] <file>: set, edit, and remove a user's Slurm-managed crontab. This file can define a number of recurring batch jobs to run on a scheduled interval. Use the same sintaxis as cron. More info at scrontab.

On the other way, you can also launch your jobs in an interactive way to run your session in one of the compute node you have requested (used eventually for graphical applications, etc) :

Code Block
languagebash
srun -N 
Code Block
languagebash
srun -N 2 --ntasks-per-node=8 --pty bash

this requests 2 nodes (-N 2) and we are saying we are going to launch a maximum of 8 tasks per node (--ntasks-per-node=8). We are saying that you want to run a login shell (bash) on the compute nodes. The option --pty is important. This gives a login prompt and a session that looks very much like a normal interactive session but it is on one of the compute nodes.

Job directives

A job must contain a series of directives to inform the batch system about the characteristics of the job and you can configure them to fit your needs. These directives appear as comments in the job script and  usually at the top just after the shebang line, with the following syntax:

...

DirectiveDescriptionDefault value
--job-name=value
The name of the job that appears in the batch queuescript_name
--partition=...
The name of the queue in slurmcompute
--output=...
The name of the file to collect the standard output of the job.  The %j part in the job directives will be sustitute by the job ID.file-%j.out
--error=...
The name of the file to collect the standard error of the job. The %j part in the job directives will be sustitute by the job ID.file-%j.err
--chdir=...
The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted.submitting directory
the job. The %j part in the job directives will be sustitute by the job ID.file-%j.err
--chdir--qos=...
Quality of Service (or queue) where the job is allocated. By default, a queue is assigned for the user so this variable is not mandatory.main
--time=...

The limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period.
The format can be: m, m:s, h:m:s, d-h, d-h:m or d-h:m:s

qos default time limit
The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted.submitting directory
--qos=...
Quality of Service (or queue) where the job is allocated. By default, a queue is assigned for the user so this variable is not mandatory.main
--time--ntasks=...

The number of processes to allocate as parallel tasks.

1
--cpus-per-task=...
Number of processors for each task. Without this option, the controller will just try to allocate one processor per task.  The number of cpus per task must be between 1 and 16, since each node has 16 cores (one for each thread).1
--ntasks-per-node
The number of tasks allocated in each node. When an application uses more than 3.8 GB of memory per process, it is not possible to have 16 processes in the same node and its 64GB of memory. It can be combined with the cpus_per_task to allocate the nodes exclusively, i.e. to allocate 2, processes per node, set both directives to 2. The number of tasks per node must be between 1 and 16.1
--mem-per-cpu
Minimum memory required per allocated CPU. Default units are megabytes unless the SchedulerParameters configuration parameter includes the "default_gbytes" option for gigabytes.DefMemPerCPU

Job environment variables

There are also a few SLURM environment variables you can use in your scripts:

...

SLURM_JOBID

...

Specifies the job ID of the executing job

...

SLURM_NPROCS

...

Specifies the total number of processes in the job. Same as -n, --ntasks

...

SLURM_NNODES

...

Is the actual number of nodes assigned to run your job

...

SLURM_PROCID

...

Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-(SLURM_NPROCS-1)

...

SLURM_NODEID

...

Specifies relative node ID of the current job. The range is from 0-(SLURM_NNODES-1)

...

SLURM_LOCALID

...

Specifies the node-local task ID for the process within a job

...

SLURM_NODELIST

...

Specifies the list of nodes on which the job is actually running

...

SLURM_MEM_PER_CPU

...

Memory available per CPU used

limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period.
The format can be: m, m:s, h:m:s, d-h, d-h:m or d-h:m:s

qos default time limit
--ntasks=...

The number of processes to allocate as parallel tasks.

1
--cpus-per-task=...
Number of processors for each task. Without this option, the controller will just try to allocate one processor per task.  The number of cpus per task must be between 1 and 16, since each node has 16 cores (one for each thread).1
--ntasks-per-node
The number of tasks allocated in each node. When an application uses more than 3.8 GB of memory per process, it is not possible to have 16 processes in the same node and its 64GB of memory. It can be combined with the cpus_per_task to allocate the nodes exclusively, i.e. to allocate 2, processes per node, set both directives to 2. The number of tasks per node must be between 1 and 16.1
--mem-per-cpu
Minimum memory required per allocated CPU. Default units are megabytes unless the SchedulerParameters configuration parameter includes the "default_gbytes" option for gigabytes.DefMemPerCPU

Job environment variables

There are also a few SLURM environment variables you can use in your scripts:

VariableDescription

SLURM_JOBID

Specifies the job ID of the executing job

SLURM_NPROCS

Specifies the total number of processes in the job. Same as -n, --ntasks

SLURM_NNODES

Is the actual number of nodes assigned to run your job

SLURM_PROCID

Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-(SLURM_NPROCS-1)

SLURM_NODEID

Specifies relative node ID of the current job. The range is from 0-(SLURM_NNODES-1)

SLURM_LOCALID

Specifies the node-local task ID for the process within a job

SLURM_NODELIST

Specifies the list of nodes on which the job is actually running

SLURM_SUBMIT_DIRThe directory from which sbatch was invoked.

SLURM_MEM_PER_CPU

Memory available per CPU used


Job examples

  • Example for a sequential job:

Job examples

Example for a sequential job:

Code Block
languagebash
#!/bin/bash
#
#SBATCH --job-name=hello
#SBATCH --output=hello.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
# From here the job starts
# To clean and load modules defined at the compile and link phases
module purge
module load ...

# echo of commands
set -x

# To compute in the submission directory
cd ${SLURM_SUBMIT_DIR}

# execution
hostname
python example.py

...


Code Block
languagebash
#!/bin/bash
#
#SBATCH --job-name=hello
#SBATCH --output=hello.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task= 4 # The job spawns in 4 cores
#SBATCH --time=10:00
# From here the job starts
# srun only works with th
module load OPENMPI/4.1.0 

# echo of commands
set -x

# To compute in the submission directory
cd ${SLURM_SUBMIT_DIR}

# number of OpenMP threads
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} 

# Binding OpenMP threads on core
export OMP_PLACES=cores

# execution with 'OMP_NUM_THREADS' OpenMP threads
srun openmp.sh
srun sleep 60

Examples of scrontab files:

To create a job that would run at the beginning of each hour, uc00 account and have a walltime of 1 minute, you would add the following to scrontab.

Code Block
languagebash
DIR=/home/user1
#SCRON -p high
#SCRON -A uc00
#SCRON -t 1:00

@hourly $DIR/multi-job.sh

To have a job run every Wednesday, every other hour during the work day, each of the first five minutes of the hour and again at the thirty minute mark, you would add the following to scrontab.

=1
#SBATCH --time=10:00
# From here the job starts
# To clean and load modules defined at the compile and link phases
module purge
module load PYTHON/3.11.1

# echo of commands
set -x

# To compute in the submission directory
cd ${SLURM_SUBMIT_DIR}

# execution
hostname
python3 example.py
  • Example for a parallel job - MPI Parallel job.

Info

Take into account that srun only works with the version 4.1.0 of OPENMPI. Other versions prompts an error. If you use other version sof OPENMPI use mpirun by default when the program is called.


Code Block
languagebash
#!/bin/bash
#
#SBATCH --job-name=hello
#SBATCH --output=hello.out
#SBATCH --ntasks=2
#SBATCH --cpus-per-task= 4 # The job spawns in 4 cores
#SBATCH --time=10:00
# From here the job starts
# srun only works with th
module load OPENMPI/4.1.0 

# echo of commands
set -x

# To compute in the submission directory
cd ${SLURM_SUBMIT_DIR}

# number of OpenMP threads
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} 

# Binding OpenMP threads on core
export OMP_PLACES=cores

# execution with 'OMP_NUM_THREADS' OpenMP threads
srun openmp.sh
srun sleep 60
Code Block
languagebash
1-5,30 8-17/2 * * wed $DIR/multi-job.sh

Software

Modules Enviroment

The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application or a compilation. Modules can be loaded and unloaded dynamically and atomically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl.

...

  • module switch acts as module unload and module load command at same time

Job submitting with Modules

We need to do the loading of the needed applications inside the job script to load them in the worker nodes once the jobs require them.

Acknowledgment in publications

Add acknowledge to IFCA at the University of Cantabria for the use of Altamira supercomputer supercomputing resources with a text similar to:

'We acknowledge Santander Supercomputacion IFCA computing support group at the University of Cantabria of Cantabria who provided access to the supercomputer Altamira Supercomputer at the Institute Advanced Computing and E-Science computing resources at the Institute of Physics of Cantabria (IFCA-CSIC) , member of the Spanish Supercomputing Network, for performing simulations/analyses.'

Getting Help

IFCA provides to users consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 17 p.m. (CEST time).

...

If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output or log files.

...