Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

System Overview

Altamira includes 158 main compute nodes, 4 additional GPU compute nodes and two 2 login server.

Main compute nodes have two Intel Sandybridge E5-2670 processors, each one with 8 cores operating at 2.6 GHz and a cache of 20MB, 64 GB of RAM memory (i.e. 4 GB/core) and 500 GB local disk.

...

All the nodes are connected to a global storage system based on GPFS (Global Parallel File System) providing a total storage of 1 PB.

File Systems

Each nodes has several areas of , more or less, 2 PB.

File Systems

...

disk space for installing system or programs and storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems. There are 2 different types of storage available inside a node:

  • GPFS Filesystems: GPFS is a distributed networked filesystem which can be accessed from all the nodes
  • Local Hard Drive: Every compute node has an internal hard drive
GPFS filesystem

The IBM General Parallel File System (GPFS) is a high-performance shared-disk file system that can provide fast, reliable data access from all blades of the cluster to a global filesystem. GPFS allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations. These filesystems are the recommended to use with most jobs, because GPFS provides high-performance I/O by "striping" blocks of data from individual files across multiple disks on multiple storage devices and reading/writing these blocks in parallel. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.

These are the GPFS filesystems available in Altamira from all nodes:

  • /gpfs/users/res → /home: users home
  • /gpfs/res_projects:
  • /gpfs/res_apps:

Bach system

The job scheduling system running at Altamira is SLURM. Current version of Slurm in Altamira is 17.11.

Running Jobs

As is defined above SLURM is the utility used at Altamira for batch processing support, so all jobs must be run through it. This part provides information for getting started with job execution at Altamira.

...

A job is the execution unit for the SLURM. A job is defined by a text file script containing a set of directives describing the job, and the commands to execute.

These are the basic directives to submit jobs:

  • sbatch <job script>: submits a job script to the queue system (see below for job script directives).
  • squeue  [-u user ]: shows all the jobs submitted by all users or by the user if you specify the -u option.
  • scancel <job id>: removes his/her job from the queue system, canceling the execution of the job if it was already running.

Getting Help

IFCA provides to users consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 17 p.m. (CEST time).

...

If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output or log files.

...