Benchmarking HEP workloads on HPC facilities
Project reference: 2109
High Energy Physics (HEP) community has a large number of various workloads (e.g. pure statistical analysis vs LHC event reconstruction) that differ not only by their requirements (compute vs I/O driven) but also by the capability of utilization of heterogeneous resources. For the purpose of standardization of these workflows and in order to simplify the process of evaluation of new computing platforms, a container-based benchmarking suite was developed that only recently started being tested at HPC facilities. The goal of this project is to contribute to this effort by adding new types of workflows to already available set and, at the same time, test out HEP workloads on new types of hardware like Nvidia GPUs, AMD CPUs and GPUs, ARM, etc. Furthermore, it would be interesting to understand how to combine efforts with PRACE benchmarking tools (i.e. UEABS).
Project Mentor: Maria Girone
Project Co-mentor:David Southwick (CERN) and Sagar Dolas (SURF)
Site Co-ordinator: Maria Girone (CERN) and Carlos Teiieiro Barjas (SURF)
- A student will get an opportunity to get familiarized with containers and how this gets integrated into HPC facilities’ batch systems.
- Experience using heterogenous computing architectures and working knowledge of running scientific workloads on these platforms.
Student Prerequisites (compulsory):
- Working knowledge of at least one programming language
- Bash, stating this explicitly rather than in 1)
- Some familiarity with distributed systems (i.e. anything beyond a single node)
Student Prerequisites (desirable):
- Some familiarity/knowledge of Containers
- Some familiarity with Slurm
Docker’s HowTo is quite good, although does not get used much at HPCs, it is still a good starting point to learn about containers: what they are and why they are needed
- Week 1-2: Start by analyzing the existing source base for the benchmarking suite and how images for containers are generated.
- Week 3-4: Incorporate several CPU/GPU based workloads into the suite and generate images for them.
- Week 5-6: Test out the images at an HPC facility. Important to test out images on different architectures.
- Week 7-8: Push all the developed updates upstream to the main repo. Write report/give presentations about the experience.
Final Product Description:
- Integrate several new types of workloads (e.g. GPU-based workloads)
- Provide performance figures of merit for the newly integrated workloads and for the existing ones that were not tested on the new type of hardware.
Adapting the Project: Increasing the Difficulty:
Of course just running the workload on a new type of hardware might sound too simple. To increase the difficulty, we would expect a student to not just run, but also to profile a given workload using whatever tools an HPC site provides for this purpose (i.e. we would expect more than just “perf” to be available)
Adapting the Project: Decreasing the Difficulty:
The difficulty could be decreased by only testing out the available images without diving into the details of a given container framework and how these images are to be generated. It should provide a basic overview of container usage at an HPC site.
- Access to an HPC site and support with co-mentoring a student and ensuring expertise from an HPC site.
*Online only in any case