Porting and benchmarking on a fully ARM-based cluster

Project reference: 2006
The aim of this project is to explore the limits of our fully ARM-based HPC cluster, Fulhame. This a relatively new cluster and whilst some codes and libraries are already ported and optimised, there is a lot of work to do in porting, optimising and understanding the strong and weak points of the system compared to others.
This project is well suited for a student looking for experience in practical HPC work, building installing and configuring codes and systems. Any subject is available, from looking at power efficiency optimisation, IO optimisation with Lustre on ARM, MPI scaling etc.

Open cabinet showing several nodes of Fulhame.
Project Mentor: Nick Johnson
Project Co-mentor: /
Site Co-ordinator: Juan Herrera
Participants: Jerónimo Sánchez García, Irem Kaya
Learning Outcomes:
Student will learn how to build, install and optimise HPC libraries & codes for ARM architecture.
Student will learn something about how to operate a production HPC cluster.
Student Prerequisites (compulsory)
Student must have experience in building and compiling code and libraries for a cluster system.
Student Prerequisites (desirable):
Some experience of package management, job and code profiling and optimisation and system level parameters (DVFS etc.) would be desirable but certainly not essential.
Training Materials:
No specific materials at present. Some will be provided closer to the placement and in conjunction with the student depending on which area of the project they want to focus on.
Workplan:
- Week 1+2: Concentrate on running basic tests on the system, getting access to all codes/libraries etc, begin to identify (if not done a priori) which specific areas to focus on during the rest of the project
- Week 3+4: Start analysis of codes/libraries/parameters which require work and produce plan, then begin work.
- Week 5+6: Main bulk of work, branching into secondary problems/code if necessary.
- Week 7+8: Finalise work, pushing code to upstream repositories, write report and give EPCC talk on work done/results achieved.
Final Product Description:
A technical paper or report describing activities done. Code commits to upstream repositories proposing fixes for any bugs found.
Adapting the Project: Increasing the Difficulty:
Work on more codes or libraries; look at system parameters.
Adapting the Project: Decreasing the Difficulty:
Concentrate more on the applications level tuning using standard profilers/debuggers etc.
Resources:
Access to ARM cluster will be required and provided by EPCC.
Organisation:
EPCC
Leave a Reply