August 7th 5:30 am, Copenhagen Central Station. I spent the weekend exploring the city along with my roommate Alessandro and my friend Antti, who traveled from Jülich to visit us. Among other activities, we walked the helical corridor to reach the top of the Rundetaarn or Round Tower and attended an outdoor chill-out music festival. Unfortunately, it was time for Antti to take his train so we said goodbye and promised to meet soon. Since the day was young, we decided to keep biking and enjoy the sunrise by the waterside. What better place for an excursion than the Langelinie promenade and the world famous statue of The Little Mermaid? Soon we were enjoying the view, clearing our minds while the city was slowly waking up.
10 am, Niels Bohr Institute. With a cup of coffee in my hand, I was ready to address the challenges of the day. But “what is the project you are working on”, you may ask. As I explained in a previous post, my summer mission is to improve the performance of the ocean modeling framework Versatile Ocean Simulation (Veros). Implemented in pure Python, it suffers from a heavy computational cost in the case of large models, requiring the assistance of Bohrium – an automatic parallelization framework that acts as a high performance alternative for the scientific computing library NumPy, targeting CPUs, GPUs and clusters. However, Bohrium can only do so much; even when it outperforms NumPy, simulations are still time consuming. It is clear that an investigation of the underlying issues behind the performance of Veros is essential.
Naturally, during the initial period, my time was mostly spent studying both the Veros documentation and the respective source code. I configured the environment for running existing model setups and experimented with the available settings. Running local tests and benchmarks of various models and backends, including NumPy and Bohrium, helped me familiarize myself with the basic internal components and the hands-on workflow of the ocean simulations.
The golden rule of optimization is: never optimize without profiling the code. Thus, I embarked on a quest to execute large scale simulations, from hundreds of thousands to millions of elements, in order to compare the performance with and without the use of Bohrium and analyze the running times of the core methods to identify possible bottlenecks. Such benchmarks are computationally intensive so I scheduled ocean models of different sizes to run on a university cluster for long periods of time. As expected, Bohrium becomes more efficient than NumPy if the setup contains at least a million elements. Among the most time-consuming methods in the core of Veros are isoneutral mixing and advection, even though they are straightforward number-crunching routines and Bohrium should theoretically accelerate them to a greater degree.
In other words, Bohrium makes a difference when the number of elements exceeds a certain threshold but there is still much room for improvement. Towards this direction, there are two possible plans of action: replace certain slow parts with handmade optimized and parallelized code or modify the existing NumPy code in order to improve its acceleration by Bohrium. For the former scenario, I am working on porting methods to Cython by adding static types and compiler directives in order to diverge from the dynamic nature of the language and speed up these specific segments independently of Bohrium. I also intend to take advantage of Cython’s OpenMP support in order to enable parallel processing via threads. For the latter part, I need to study the internal details of Bohrium and examine the C code produced by it for these time intensive methods.
There are multiple possible paths to be considered and tested as solutions for the problem. In the next weeks I will attempt to implement and compare them, making sure that performance gains by Bohrium are not affected by these changes. On the contrary, Bohrium should benefit from the code restructure. Besides working on the project, I plan to visit more beautiful sights of Copenhagen and make a dedicated blog post about the city if there is enough time!