Jumping into the HPC world!

Jumping into the HPC world!

The training week in Bologna is rapidly over. Time to pack the luggage another time and I and Sean are in Luxembourg (Esch-Sur-Alzette to be precise). During the first day, we were kindly welcomed in the research group. A complete tour into the HPC facility let us to better understand how a real supercomputer is built and maintained.

A visit to the national supercomputer in Luxembourg

My project at the Summer of HPC will focus on analyzing and reporting energy consumption in HPC facilities. In the previous post I have mentioned that these centres are power demanding. As an example, the supercomputer at CINECA, where the training week has been held, is listed as one of the major electricity consumers in the region (and it might even be asked to switch off some of the nodes when the energy request in the region is too high). For such a reason, keeping trace and informing the user about the usage is a key component of the system.

The campus is located next to the Belval’s blast furnaces

More and more attention is being paid on the efficiency of such systems, where the peak performance is not anymore the only indicator. The Green500 reports the best supercomputers in the world ranked by GFlops/watts, simply put the number of computations performed per watt. In the last years, GPUs are increasingly requested by users, in fact, they allow to speed up the training of deep neural networks. The interest is shifting to such accelerators and thus, the consumption must be controlled and profiled.

Before going into the details of the project, I will be first explaining the context. Each user of such clusters sends its job in a set of instructions, which will be scheduled and sent to the computing nodes, where the results are computed. Slurm is an open-source job scheduling system for large and small Linux clusters, which is adopted in most of the HPC facilities. Slurm allows building plugins to be either attached to the source code or launched through SPANK (Slurm Plug-in Architecture for Node and job (K)control).

Many packages and libraries can be used to retrieve energy consumption, such as LIKWID or PAPI. My project focuses on the consumption of the aforementioned GPUs at a low level. NVML, the library provided by NVIDIA, one of the main producers of GPUs, gives the possibility of acquiring the consumption of each GPU attached to the mainboard at a given time. Such results will be saved in an HDF file, a format designed to store and organize large amounts of data. The data will, then, be analyzed to report the electricity consumption in the requested job.

The PRACE Summer school is giving me the opportunity to both learn in an exciting environment and travel to nearby places in my free time. Luxembourg city offers nice views, museums and bars. Moreover, the Grand-Duché is located in the centre of Europe, thus, it allows to easily reach other cities such as in Belgium, Germany and France. Below an image from one of the weekends spent in Bruges. See you in the next blog post!

A snapshot from Bruges
Please follow and like us:
error

Computer engineer graduate at the University of Padova. Passionate about coding since day 0. Spending the summer at the University of Luxembourg, developing a plugin to enhance the Slurm energy reporting capabilities.

Tagged with: , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.