To Py or not to Py? – Python project update
Hello everybody! I hope you are all well.
As you can see in the featured image I am really excited about receiving the programme’s t-shirt!
So, in my last blog post I introduced myself and asked you to answer a question in the comments. I am pleased that you took the challenge and answered. Now, in today’s post I would like to describe some information about the project I work on and my progress so far.
Nowadays, the use of Python is getting popular, mainly because it’s really user-friendly and saves quite a lot of developing and debugging time. However, when it comes to performance, there are some lower level programming languages, which means they are closer to the computer than the human, that produce programs with less execution time. Simulations, visualisations and other heavy calculation programs that run on supercomputers reveal Python’s poor performance. So the question is; can Python be optimised to run fast and benefit from HPC architectures?
Before answering to that, we have to understand some basic features of the Python program that we will study.
The program performs a Computational Fluid Dynamics (CFD) simulation of fluid flow in a cavity.
The Fluid Dynamics problem is a continuous system that can be described by partial differential equations, but in order for a computer to run simulations, the calculations need to be put into a grid (discretisation). In this way, the solution can be approached by finite difference method, which means that the value of each point in the grid is updated using the values of neighboring points.
The program can be parameterized by specifying the variables below:
- Scale Factor – affects the dimensions of the box cavity and consequently the size of the array(s) in which the grid is stored.
- Number of Iterations – affects the number of the steps in the algorithm, the larger it is the more accurate the result will be.
- Reynolds number (Re) – defines the viscosity which affects the presence of vertices (whirlpools) in the flow.
The simulation result is visualized by arrows and colors drawn in an image representing the grid. The arrows demonstrate the direction of the fluid at each point, while the different colors indicate the fluid’s speed, with blue being low speed and red being high speed.
There are several techniques that can be applied to speed up Python codes. The goal of this project is to investigate optimisations for Python programs that run not only on CPUs but also on GPUs.
Progress in the first 3 weeks
Some of my early tasks were to study the algorithm, understand the existing Python and C codes, get access to HPC systems and submit my first jobs to the supercomputers.
Currently, I am working on optimisations to the Python code. The metric that interests us is the iteration time, which is derived if we divide the total time for N iterations by N. I have been trying to use several Python modules that accelerate the calculations and found out that numexpr module is the best for our case. The Python baseline (unoptimised) code uses the Numpy module to achieve fast array calculations. However, only the numexpr version of the Python code can compete against C, since the Numexpr module creates less temporary arrays and uses multi threading internally.
Next goals and conclusion
My next goal is to produce a performance graph for the MPI versions, which I will describe next time, and then move on to developing an equivalent Python program for GPUs.
Before concluding, my question for you this time is the question in the title:
“ To Py or not to Py? ”
And by that I mean what is your experience in Python, have you ever used it? If yes, is it your most preferred language? Have you ever bothered about its performance?
Please feel free to write your thoughts on that down in the comments section.
That’s all for this blog, I really hope you found some interesting points in it and I will be happy to see you in one of my future posts.