Titan cluster in LPTMS is a Linux cluster composed of one head node and twenty-two compute nodes, it uses the scheduler “SLURM” to to keep track of available resources, allowing job requests to be efficiently assigned to various compute resources (CPU and GPU). The cluster provides computational capacity for single and multi-processor jobs, it possess 2914 CPUs, 3 GPUs and 8 TB memory, the highest CPU count per node is 192 and the maximum memory size per node provided is 3TB.
The head node is the server where you log in, compile code, assign tasks and coordinate jobs; the computing node is to perform the tasks through its memory and CPU.
A partition represents a group of nodes in the cluster. There are several partitions:
q-1heure allows a maximum run time of 1 hour
q-1jour* allows a maximum run time of 1 day
q-1sem allows a maximum run time of 1 week
q-2sem allows a maximum run time of 2 weeks
q-1mois allows a maximum run time of 1 month
q-2mois allows a maximum run time of 2 month
qbigmem allows a maximum run time of 1 week, used for large memory work
qopnmp allows a maximum run time of 1 week, used for parallel jobs
debug for the purpose of test
Using GPUs
Titan cluster contains five GPUs (Tesla T4 and NVIDIA L4), the GPUs are located in a different compute nodes. In these nodes, the Nividia CUDA Toolkit is installed and provides a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of applications.
Useful links:
Introduction to CUDA C++
PyCUDA
Numba
To demand the GPU resource for Python in the script:
!/bin/bash
#SBATCH --ntasks-per-node=1
#SBATCH --partition=gpu
#SBATCH --mem-per-cpu=10G
#SBATCH --time=1:00:00
#SBATCH --job-name=celloracle
#SBATCH --output=celloracle-%J.log
srun conda run -n tf python ./load-tensor.py
Using programming languages
Different languages are installed in Titan Cluster, like Python, C/C++, Fortran, Julia, etc.
Using SLURM
Here is the detail about Titan Cluster.