A dashboard for on-line assessment of jobs execution efficiency

Project reference: 2205
Jobs are the essence of the supercomputing. Users creates jobs to solve their scientific and societal challenges, the supercomputing machine executes the job to obtain the aimed results. Crafting efficient jobs which maximises the performance (or minimise the energy efficiency) of the underlying supercomputer HW is a daunting task for the user. Run-time libraries have been developed to turn computational inefficiencies into energy-saving – to reduce the data centre’s carbon footprint. It is thus important to make the user aware of the computational inefficiencies and the energy savings achievable by enabling energy-saving mechanisms in his job run.
This project aims to design a live dashboard of jobs execution, providing to the user and the system administrator with run-time information on the use of microarchitectural components, communication primitives, and computation unbalance.
The Job live dashboard will be built on top of Countdown and Examon frameworks. The first is an MPI energy-saving run-time library deployed at CINECA; the second is a big-data monitoring framework deployed at CINECA to live to monitor the status of the supercomputer. Dashboards will be created using Grafana.
Project Mentor: Andrea Bartolini
Project Co-mentor: Martin Molan
Site Co-ordinator: Massimiliano Guarrasi
Learning Outcomes:
Increase student’s skills in modelling, understanding and visualising the computational characteristics of scientific application.
Student Prerequisites (compulsory):
Computer architecture, basic knowledge of parallel computing.
Student Prerequisites (desirable):
MPI, python, json, previous experience on visualizing data.
Training Materials:
https://www.youtube.com/watch?v=5ofvPKBzU40
Workplan:
Week 1: Common Training session
Week 2: Introduction to CINECA systems, small tutorials on big data, data collection systems, countdown reporting and detailed work planning.
Week 3: Study of the countdown reported information and Grafana dashboards.
Week 4, 5: Design and testing of the Grafana Dashboard for Jobs.
Week 6, 7: Final stage of the production phase. Implementation of feedback from domain experts and end-users (CINECA staff).
Week 8: Finishing the final movie. Write the final Report.
Final Product Description:
A live dashboard of computational and energy efficiency of jobs in production.
Adapting the Project: Increasing the Difficulty:
Developed advanced performance metrics based on the elaboration of the Countdown data. Development of more complex visualisations.
Adapting the Project: Decreasing the Difficulty:
Design basic visualisation.
Resources:
The student will have access to our facility, our HPC systems and databases monitoring data and the Grafana frontend. The student will be provided with scientific applications and computational hours to collect, analyse the jobs information and design the dashboard.
Organisation:
CINECA – Consorzio Interuniversitario
Leave a Reply