Graphical interface for real time monitoring, automatic event detection, and alert triggering in HPC parallel software

Project reference: 1502

Large HPC physics simulations can create local signs of failure or success long before a global signature is observed. If these signs go unnoticed valuable computer resources could be spared. We have created a system that monitors software in real time looking for predefined signatures of problems. When one is detected, the software can create a fast automatic rendering of the problem area and send an alert to the user, who can evaluate the seriousness of the issue and decide whether to cancel the computation early.

In this project we will create a web accessible interface that will allow users to manage their running jobs and (if any) their current alerts. Alongside the user will also be able to monitor the global metrics of the software to evaluate general performance, including 3D representations of the computational mesh and their domain decomposition. The system is currently implemented over Alya, a metaphysics simulation software of the PRACE benchmark.

 

BSC3

 

Project mentor: Fernando Cucchietti

 

Site Co-ordinator: Maria Ribera Sancho

 

Learning Outcomes

The student will learn data visualization techniques and technologies, especially applied to dashboard designs.

 

Student Prerequisites (compulsory)

Proficiency with a scripting language (Bash and Python prefered), Javascript/HTML

 

Student Prerequisites (desirable)

Experience with D3.js, Websockets, or Node.js

 

Training Materials

http://d3js.org/

http://nodeschool.io/

http://threejs.org/docs/

 

Workplan

  • Week 1:          Training week
  • Week 2:          Literature Review Preliminary Report (Plan writing)
  • Week 3 – 7:    Project Development
  • Week8:           Final Report write-up

Final Product Description

The final product will allow the real time monitoring of a production scale HPC software, and prevent the unnecessary use of HPC resources.

 

Adapting the Project – Increasing the Difficulty

The project is on the appropriate cognitive level, taking into account the timeframe and the need to submit final working product and 2 reports.

 

Adapting the Project – Decreasing the Difficulty

The Interface will be designed in full but some of the features may not be developed to ensure working product with some limited features at the end of the project.

 

Resources

The student will need access to standard computing resources (laptop, internet connection) as well as an account in Marenostrum.

 

Organization
Barcelona Supercomputing Centre

BSC_logo
Please follow and like us:
Tagged with: , , ,