Parallelising Scientific Python applications

Parallelising Scientific Python applications
Example output from serial CFD solver.

Project reference: 1609

EPCC is one of the major providers of training in high performance
computing (HPC) in Europe, offering a range of well-established courses for users of HPC throughout the UK and Europe. As a PRACE Advanced Training Centre (PATC), we offer a regular programme of courses in many aspects of HPC and advanced computing.

Use of Python in scientific computing is fast becoming popular. The nature of the Python language enables rapid proto-typing of code as well a unified scientific workflow: from generating simulation data to visualising that data and creating high quality images for publication.

We currently offer a Scientific Python course that introduces Python beginners to key Python packages, such as NumPy, Matplotlib and SciPy, that are needed for the development of scientific applications in Python. The course culminates in learners developing a serial Computational Fluid Dynamics (CFD) application code. The main aim of this project is to extend the course by parallelising the example applications within it. This would primarily be done using MPI and the mpi4py Python package, although other ways of introducing parallelisation will also be considered. By ensuring the application codes can run on ARCHER, the UK’s National Supercomputing service, the student will gain experience using a large HPC facility such as ARCHER.

Initial target applications include a simple 2D CFD solver and the computation of the Mandelbrot set. However, there are many other applications that could be tackled, e.g. a cellular automaton traffic model or an image processing application.

There is also scope to create an animation of the CFD application and to apply what is learnt in extending the Scientific Python course to other training courses.

Example output from serial CFD solver.

Example output from serial CFD solver.

Project Mentor: Dr Neelofer Banglawala

Site Co-ordinator: Catherine Inglis

Student: Marta Cudova

Learning Outcomes:
The student will learn in detail about Python and how it is used in scientific applications, parallel Python and the many options available, Python on HPC facilities, as well as how to profile, debug, test and document code as part of learning Best Practices for Software Development.

Student Prerequisites (compulsory): 
A strong programming background with an interest in both HPC and scientific computing.

Student Prerequisites (desirable): 
Experience in Python and MPI and/or the willingness to learn these languages. Some profiling experience would also be advantageous.

Training Materials:
These will be provided to the successful student once they accept the placement.

Workplan:
Work package 1 : (1 week) – SoHPC training week
Work package 2 : (2 weeks) – familiarise themselves with Python and the Scientific Python course material. Profile serial training applications, determine how parallelisation can be implemented into these applications. Identify opportunities for visualisation.
Work package 3 : (3 weeks) – Implement parallelisation using mpi4py and/or alternatives investigated in Work package 2.
Work package 4 : (2 weeks) – Debug, test and profile code, produce documentation for their extended application and present results to EPCC staff.

Final Product Description: 
The final product will be a suite of parallelised scientific applications, specifically designed and extended for training purposes. We would plan to produce videos of those applications that have graphical output, e.g. the CFD solver. If successful, the applications would also be run on Wee ARCHIE, which is a small parallel system made of Raspberry Pi computers specifically designed for EPCC’s outreach activities at science festivals etc.

Adapting the Project: Increasing the Difficulty:
A good student could consider different ways to introduce parallelisation to the course examples other than MPI. They could also investigate creating an animation of the core CFD example and look at whether it may be possible to parallelise this. There will also have the opportunity to investigate parallelisation of other serial Python code applications.

Resources:
All required resources will be provided by EPCC, including suitable development environment, access to the UK National Supercomputer ARCHER and to Wee ARCHIE.

Organisation:
Edinburgh Parallel Computing Centre (University of Edinburgh)
EPCC

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.