Performance of Python programs on new HPC architectures

Project reference: 1908

Python is widely used in scientific research for tasks such as data processing, analysis, and visualisation. However, it is not yet widely used for large-scale modelling and simulation on high-performance computers due to its poor performance – Python is primarily designed for ease of use and flexibility, not for speed. However, there are many techniques that can be used to dramatically increase the speed of Python programs such as parallelisation using MPI, high-performance scientific libraries and fast array processing using numpy Python library. Although there have been many studies of Python performance on Intel processors, there have been few investigations on other architectures such as ARM64. EPCC has recently installed a parallel computer entirely constructed from ARM64 processors, a Catalyst system from HPE. This project will involve converting a simple parallel program (already written in C and Fortran) to Python, measuring its performance on the Catalyst system and comparing to standard Intel machines. The aim will then be to try and improve the performance using standard techniques – although the approaches may be well known, Python performance on the new ARM64 processor is not well understood so the results will be very interesting.

Sample output of existing program showing turbulent flow in a cavity

Project Mentor: Dr. David Henty

Project Co-mentor: Dr. Oliver Brown and Dr. Magnus Morton

Site Co-ordinator: Ben Morse

Learning Outcomes:
The student will develop their knowledge of Python programming and learn how to compile and run programs on a range of leading HPC systems.

Student Prerequisites (compulsory):
Ability to program in one of these languages: Python, C, C++ or Fortran. A willingness to learn new languages.

Student Prerequisites (desirable):
Ability to program in Python.

Training Materials:
Material from EPCC’s Python for HPC course.


  • Task 1: (1 week) – SoHPC training week
  • Task 2: (2 weeks) –Understand the functionality of existing C/Fortran code and make an initial port to Python
  • Task 3: (1 week) – Measure baseline performance on Intel and ARM systems
  • Task 4: (2 weeks) Investigate performance optimisations and write a final report

Final Product Description:

  • Development of a parallel python application;
  • Benchmarking results for Python performance on a range of parallel machines;
  • Recommendations for how to improve Python performance on ARM processors.

Adapting the Project: Increasing the Difficulty:
The project can be made harder by investigating advanced optimisation techniques such as cross-calling from Python to compiled languages. Alternative languages such as Julia could also be considered.

Adapting the Project: Decreasing the Difficulty:
The project can be made simpler by considering a serial rather than a parallel code.

Access to all HPC systems can be given free of charge by EPCC.



Please follow and like us:
Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.