Performance of Parallel Python Programs on ARCHER2

Project reference: 2115
Python is widely used in scientific research for tasks such as data processing, analysis and visualisation. However, it is not yet widely used for large-scale modelling and simulation on high performance computers due to its poor performance – Python is primarily designed for ease of use and flexibility, not for speed. However, there are many techniques that can be used to dramatically increase the speed of Python programs such as parallelisation using MPI, high-performance scientific libraries and fast array processing using numpy. Although there have been many studies of Python performance on Intel processors, there have been few investigations on other architectures such as AMD EPYC and GPUs. In 2021, EPCC will have access to these architectures via the new UK HPC National Tier-1 Supercomputer ARCHER2 and the Tier-2 system Cirrus.
A Summer of HPC project in 2020 developed optimised parallel Python version of an existing C program which performs a Computational Fluid Dynamics (CFD) simulation of fluid flow in a cavity, including an initial GPU implementation. Studies were done on the previous UK National HPC system, ARCHER, and on Cirrus.
The UK National HPC service ARCHER2 has recently been launched which is a Cray SHASTA system with 750,000 CPU-cores, so we are very interested on its performance characteristics. This year’s project will involve investigating performance on ARCHER2 which has very different processing nodes (128-core AMD) from the Intel systems studied previously. There is also the option of continuing the GPU investigation on Cirrus.

Sample output of existing program showing turbulent flow in a cavity
Project Mentor: Dr. David Henty
Project Co-mentor: Dr. Mario Antonioletti
Site Co-ordinator: Catherine Inglis
Participants: Alejandro Dinkelberg, Jiahua Zhao
Learning Outcomes:
The students will develop their knowledge of Python programming and learn how to compile and run programs on a range of leading HPC systems. They will also learn how to use GPUs for real scientific calculations.
Student Prerequisites (compulsory):
Ability to program in one of these languages: Python, C, C++ or Fortran. A willingness to learn new languages.
Student Prerequisites (desirable):
Ability to program in Python.
Training Materials:
Material from EPCC’s Python for HPC course or the PRACE Python MOOC.
Workplan:
Task 1: (1 week) – SoHPC training week
Task 2: (2 weeks) –Understand functionality of existing parallel C and Python codes and make initial port to new HPC platform.
Task 3: (3 week) – Measure baseline performance on new HPC platforms.
Task 4: (2 weeks) Investigate performance optimisations and write final report
Final Product Description:
Benchmarking results for Python performance on a range of parallel machines;
Recommendations for how to improve Python performance on AMD EPYC processors.
Optimisation of a GPU-enabled parallel Python application.
Adapting the Project: Increasing the Difficulty:
The project can be made harder by investigating advanced optimisation techniques such as cross-calling from Python to other compiled languages such as C, C++ or Fortran.
Adapting the Project: Decreasing the Difficulty:
The project can be made simpler by considering only one of the target platforms, or by considering CPU-only versions and omitting the GPU work.
Resources:
Access to all HPC systems can be given free of charge by EPCC
Organisation:
EPCC
Leave a Reply