Full speed ahead – Accelerating ocean simulations

How will Earth’s climate change in the next century? This is the kind of questions climate researchers ask themselves and the reason they develop global climate models. These are used to simulate conditions over long periods of time and under various scenarios, including factors such as the atmosphere, oceans and the sun. Understanding and predicting climate behavior comes at a price though – it is clearly a computationally intensive task and heavily relies on the use of supercomputing facilities. This is why climate models estimate trends rather than events and their results are less detailed in comparison to similar models used for weather forecasting. For example, a climate model can predict the average temperature over a decade but not on a specific day. Thus, climate models no longer scale as computers are getting bigger. Instead, climate research depends on extending the length of a simulation into hundreds of thousands of years, creating the need for faster computers in order to advance the field.

Simulation of waves traveling in the Atlantic
This is where my summer project in Niels Bohr Institute, University of Copenhagen comes into play. A number of ocean simulation models are implemented in the Versatile Ocean Simulation (Veros) framework. Aiming to be the Swiss Army knife of ocean modeling, it offers many numerical methods to calculate the state of the environment during each step of the simulation. It supports anything between realistic and highly idealized setups. Everything is implemented in pure Python in order to create an open-source model that makes it easy to access, contribute and use. However, choosing a dynamically typed, interpreted language over a compiled one like Fortran leads to heavy computational cost making acceleration necessary for larger models.
How can we accelerate climate models? Parallel computation saves the day and enables us to take full advantage of the underlying architecture. In particular, GPUs can dramatically boost performance because they contain thousands of smaller cores designed to process tasks in parallel. On the other hand, processors such as the Xeon Phi series combine many cores onto a single chip, delivering massive parallelism and significant speed-ups.

Niels Bohr Institute, University of Copenhagen
In the case of Veros, overcoming the computational gap is mainly accomplished through the use of Bohrium, a framework that acts as a high performance alternative for the scientific computing library NumPy. Bohrium seamlessly integrates into NumPy and enables it to utilize CPU, GPU and clusters without requiring any code modifications. Bohrium takes care of all the parallelism in the background so the developer can concentrate on writing a nice, readable ocean model. Also, porting the parallel program to run on another accelerator does not require any code changes as the parallelization is abstracted by the framework.
Bohrium is extremely useful but naturally there are cases where automatic parallelization fails to significantly improve performance. My mission during the next two months is to profile and analyze the execution of the core methods and models of Veros in order to identify the most time-consuming parts of the framework. The next step is to optimize the NumPy code using Cython to add static types in order to diverge from the dynamic nature of the language and speed up execution. I also plan to take advantage of Cython’s OpenMP support to enable parallel processing via threads. Finally, the last task of the project is to port these time-intensive methods to run on accelerators including GPUs and Xeon Phis using parallel computation interfaces including PyOpenCL and possibly PyCUDA. Bohrium will detect that these segments were parallelized by the developer and will not attempt to optimize them. In this way, the power of handmade parallel code is combined with the automatic magic of Bohrium to produce truly accelerated ocean simulations.
Speaking of parallelism, while I will be making progress on the simulation forefront, I intend to explore the beautiful city of Copenhagen along with my friend Alessandro and take as much photographic evidence as possible. Stay tuned for more posts on both parts of the Summer of HPC experience!
[…] implementations of algorithms. Business hasn’t been too good lately, mostly because of all these new competing faces on the scene and because libraries like OpenCL make it too easy to write […]