This user's guide for the High Performance Computing at IFCA is intended to provide the minimum amount of information needed by a new user of this system. As such, it assumes that the user is familiar with basic notions on scientific computing, in particular the basic commands of the Unix operating system, and also with basic techniques for the execution of applications in a supercomputer, like MPI or OpenMP.
The information in this guide includes basic technical documentation about the system, the software environment, and also on available applications.
Please read it carefully and if any doubt arises don't hesitate to contact our support mail (see below in Getting help).
The HPC system at IFCA includes 60 wncompute nodes, 158 nodes for Altamira supercomputer, 5 gpus nodes and 2 login servers. The nodes are grouped in different partitions based on the CPU architecture family. The main features of each compute node are specified below:
All compute nodes and gpus are running Alma Linux 9 except Altamira nodes that are running Alma 9.
All the nodes includes 10/25 Gigabit Ethenet network. Wncompute and gpus nodes with IB EDR/HDR and acompute nodes with 10Gbps and OPA 100.
All the nodes are connected to a global storage system based on GPFS 5.x.x(Global Parallel File System) providing a total storage of 3.2 PB.
Each node has several areas of disk space for installing system or programs and storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems. All user area is running above the GPFS filesystem which is a distributed networked filesystem which can be accessed from all the nodes.
The IBM General Parallel File System (GPFS) and branded as IBM Storage Scale is a high-performance shared-disk file system that can provide fast, reliable data access from all blades of the cluster to a global filesystem. GPFS allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations. These filesystems are the recommended to use with most jobs, because GPFS provides high-performance I/O by "striping" blocks of data from individual files across multiple disks on multiple storage devices and reading/writing these blocks in parallel. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.
These are the GPFS filesystems available from all nodes:
The job scheduling system running at Altamira is SLURM. Current version of Slurm in Altamira is 23.02.7.
Slurm has partitions to allocate jobs. In slurm, a group of nodes with same features are assigned to a partition. This following table resumes the main partitions and the principal properties of each of them. By default, compute is configured if --partition option is not defined in the job. Set it to a specific particion based on the user group or use the testing partition for test jobs before submitting to the production partition. Test jobs in the testing queue has a limit of 8 cpus and 3 hours of time as you can see below at qos.
Partition | Description |
---|---|
wncompute_ifca | 10 nodes for ifca users |
wncompute_meteo | 30 nodes for the meteo group |
wncompute_astro | 20 nodes for the astro group |
wngpu | 5 nodes for anvanced analytics computing |
compute | Main partition in Altamira |
res | Group of nodes for RES users |
testing | This partition can be used to test new workflows and also to help new users to familiarise themselves with the SLURM batch system. Both serial and parallel code can be tested on the testing partition. |
Specify the Quality of Service (Qos) for each job submitted to slurm. Jobs request a QOS using the "--qos=" option to the sbatch, salloc, and srun commands. By default,
Group | Queue | Description | Limit number of nodes | Max CPU cores per user | Max Run Time | Max jobs per user |
---|---|---|---|---|---|---|
Ifca, astro, meteo, etc | main | Main queue for generic users | 32 | 512 | 3 days | 1000 |
testing | This queue can be used to test new workflows and also to help new users to familiarise themselves with the SLURM batch system. Both serial and parallel code can be tested on the testing queue. | 4 | 32 | 3 hours | 2 | |
Res | res_a | Main queue for RES users | 64 | 1024 (64 nodes) | 3 days | 500 |
res_c | Queue assigned for res users when they reach the period hours limits requested | 4 | 64 (4 nodes) | 1 day | 10 |
Once you have a username and its associated password you can get into Altamira system, connecting to one of the the 2 login nodes at login1.ifca.es in the case of the first login or login2.ifca.es for the second one (see Login node to know more information about the login nodes). The password provided is temporal, you must change this initial password and synchronize the OTP after connecting to IPA following the instructions provided by the support team. Also use a strong password (do not use a word or phrase from a dictionary and do not use a word that can be obviously tied to your person).
You must use Secure Shell (ssh) tools to login into or transfer file into Altamira. We do not accept incoming connections from protocols as telnet, ftp, rlogin, rcp, or rsh commands. The connection to the logins must be done in the following way:
$ ssh username@loginX.ifca.es (where X is 1 or 2)
Once you are logged into the logins you cannot make outgoing connections for security reasons (contact us in an exceptional case).
If you cannot get access to the system after following this procedure you can contact us, (see Getting Help to know how to contact with us).
As is defined above SLURM is the utility used at HPC for batch processing support, so all jobs must be run through it. This part provides information for getting started with job execution at Altamira, (see the official slurm documentation to get more information on how to create a job) .
NOTE: In order to keep the login nodes in a proper load, a 10 minutes limitation in the CPU time is set for processes running interactively in these nodes. Any execution taking more than this limit should be carried out through the queue system. |
A job is the execution unit for the SLURM. A job is defined by a script containing a set of directives describing the job, and the commands to execute.
These are the basic directives to submit jobs:
On the other way, you can also launch your jobs in an interactive way to run your session in one of the compute node you have requested (used eventually for graphical applications, etc) :
srun -N 2 --ntasks-per-node=8 --pty bash |
this requests 2 nodes (-N 2) and we are saying we are going to launch a maximum of 8 tasks per node (--ntasks-per-node=8). We are saying that you want to run a login shell (bash) on the compute nodes. The option --pty is important. This gives a login prompt and a session that looks very much like a normal interactive session but it is on one of the compute nodes.
A job must contain a series of directives to inform the batch system about the characteristics of the job and you can configure them to fit your needs. These directives appear as comments in the job script and usually at the top just after the shebang line, with the following syntax:
#SBATCH option=value |
Note that these directives:
This table describes the common directives you can define in your job (see below an example):
Directive | Description | Default value |
---|---|---|
--job-name=value | The name of the job that appears in the batch queue | script_name |
--partition=... | The name of the queue in slurm | compute |
--output=... | The name of the file to collect the standard output of the job. The %j part in the job directives will be sustitute by the job ID. | file-%j.out |
--error=... | The name of the file to collect the standard error of the job. The %j part in the job directives will be sustitute by the job ID. | file-%j.err |
--chdir=... | The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted. | submitting directory |
--qos=... | Quality of Service (or queue) where the job is allocated. By default, a queue is assigned for the user so this variable is not mandatory. | main |
--time=... | The limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period. | qos default time limit |
--ntasks=... | The number of processes to allocate as parallel tasks. | 1 |
--cpus-per-task=... | Number of processors for each task. Without this option, the controller will just try to allocate one processor per task. The number of cpus per task must be between 1 and 16, since each node has 16 cores (one for each thread). | 1 |
--ntasks-per-node | The number of tasks allocated in each node. When an application uses more than 3.8 GB of memory per process, it is not possible to have 16 processes in the same node and its 64GB of memory. It can be combined with the cpus_per_task to allocate the nodes exclusively, i.e. to allocate 2, processes per node, set both directives to 2. The number of tasks per node must be between 1 and 16. | 1 |
--mem-per-cpu | Minimum memory required per allocated CPU. Default units are megabytes unless the SchedulerParameters configuration parameter includes the "default_gbytes" option for gigabytes. | DefMemPerCPU |
There are also a few SLURM environment variables you can use in your scripts:
Variable | Description |
---|---|
SLURM_JOBID | Specifies the job ID of the executing job |
SLURM_NPROCS | Specifies the total number of processes in the job. Same as -n, --ntasks |
SLURM_NNODES | Is the actual number of nodes assigned to run your job |
SLURM_PROCID | Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-(SLURM_NPROCS-1) |
SLURM_NODEID | Specifies relative node ID of the current job. The range is from 0-(SLURM_NNODES-1) |
SLURM_LOCALID | Specifies the node-local task ID for the process within a job |
SLURM_NODELIST | Specifies the list of nodes on which the job is actually running |
SLURM_SUBMIT_DIR | The directory from which sbatch was invoked. |
SLURM_MEM_PER_CPU | Memory available per CPU used |
#!/bin/bash # #SBATCH --job-name=hello #SBATCH --output=hello.out #SBATCH --ntasks=1 #SBATCH --time=10:00 # From here the job starts # To clean and load modules defined at the compile and link phases module purge module load PYTHON/3.11.1 # echo of commands set -x # To compute in the submission directory cd ${SLURM_SUBMIT_DIR} # execution hostname python3 example.py |
Take into account that srun only works with the version 4.1.0 of OPENMPI. Other versions prompts an error. If you use other version sof OPENMPI use mpirun by default when the program is called. |
#!/bin/bash # #SBATCH --job-name=hello #SBATCH --output=hello.out #SBATCH --ntasks=2 #SBATCH --cpus-per-task= 4 # The job spawns in 4 cores #SBATCH --time=10:00 # From here the job starts # srun only works with th module load OPENMPI/4.1.0 # echo of commands set -x # To compute in the submission directory cd ${SLURM_SUBMIT_DIR} # number of OpenMP threads export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} # Binding OpenMP threads on core export OMP_PLACES=cores # execution with 'OMP_NUM_THREADS' OpenMP threads srun openmp.sh srun sleep 60 |
The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application or a compilation. Modules can be loaded and unloaded dynamically and atomically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl.
The most important commands of module tool are: list, avail, load, unload, switch and purge
module list shows all the modules you have loaded
module avail shows all the modules that user is able to load
module load let user load the necessary environment variables for the selected modulefile (PATH, MANPATH, LD_LIBRARY_PATH...etc)
module unload removes all environment changes made by module load command
module switch acts as module unload and module load command at same time
We need to do the loading of the needed applications inside the job script to load them in the worker nodes once the jobs require them.
Add acknowledge to IFCA at the University of Cantabria for the use of supercomputing resources with a text similar to:
'We acknowledge IFCA computing support group at the University of Cantabria who provided access to the Advanced Computing and E-Science computing resources at the Institute of Physics of Cantabria (IFCA-CSIC) for performing simulations/analyses.'
IFCA provides to users consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 17 p.m. (CEST time).
User questions and support are handled at:
Spanish Supercompiting Network (RES) at <res_support@ifca.unican.es>
computing.support@ifca.unican.es
If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output or log files.