Parallel boundary point method: Matlab has surrendered to C++!
After a little bit over a month of coding I can finally state it: Matlab has been knocked out! Both by the serial and parallel version of the C++ code actually. I am satisfied with that. Ever since I started in July there are a lot of things I have learnt. I will try to retrace some of them.
One of the main lessons sounds like this: “code, make it work, code some more…optimize it later”. One of my weaknesses is that I like to think in my head and see the solution straight ahead. Probably not a good habit in computing. I have to admit that it didn’t take me long to achieve a satisfactory parallelization. What took longer was setting my mind to a parallel mode and stop thinking in serial.
I started to see improvements in the project once I managed to link the correct version of SuperLU, a sparse direct solver. Before that the sparse linear solver was solved using LAPACK (which actually deals with dense matrices) and the code was just underperforming. I experienced even bigger progress after using Eigen’s sparse solver. The only question was how to ping-pong data from Armadillo to Eigen, and the other way around. Copying an Armadillo vector into an Eigen object before solving the system in every iteration, and then copying it in reverse was not an option, since we are talking about huge vectors (more than 500k entries). Most probably I wanted to reuse the memory occupied by Armadillo as an Eigen type. So I listened to Master Yoda (check picture and if you are wondering what RTFM means check here) and the good news is that Eigen objects can be mapped to raw buffers. How? Take a look at the picture below, where I am mapping two Eigen vectors to already existing Armadillo vectors (rhs and y). Note, I am using an iterator returned by .begin() that I already talked about in my previous post.
The biggest improvement came the moment I linked Armadillo to OpenBLAS instead of the traditional BLAS and LAPACK libraries. OpenBLAS is a high performance, multi-threaded, replacement of BLAS. The only precaution is to set the environmental variable OPENBLAS_NUM_THREADS to 1. This is because BLAS will spawn as many threads as cores available on a node. So basically the requested MPI processes will be spawned (this is your program) and each MPI process will spawn additional threads (that’s the BLAS)…this is an ingredient for performance disaster.
But this is still serial mode. What about parallelism? The version of the code is orientated towards semidefinite problems (SDP) that have a block diagonal structure. Basically the matrices that define the SDP can be splitted into smaller sub-matrices along the diagonal. The boundary point method is illustrated in the picture below (for more info check the published article by Povh, Rendl, and Wiegele) and the red section is repeated in every inner iteration as many times as the number of blocks. Here is where parallelization kicks in. The serial code just loops through this part whereas the parallel version spawns a number of MPI processes based on the number of blocks and each process computes independently. The results are then gathered before entering a new outer iteration. Right now the testing phase is in progress. The workload is well balanced among the processes, but of course I do not expect the parallel code to scale linearly with the number of processes. In fact it does not so far, clearly because of communication costs.
This is more or less the state of the art after five weeks. Less than three weeks until the end of the programme now. My project is not so visually oriented, so I need to come up with an idea for the final presentation. Beside the project, I am now enjoying Ljubljana during my second month of SoHPC. Throughout the week I usually walk around town and have a swim at one of the local swimming pools. During the weekend we have the opportunity to explore some more. For instance, Ben and I visited Zagreb and spent a night there. On Sunday, right before travelling back to Ljubljana, we wanted to visit the Museum of Illusions. Unfortunately Google Maps brought us to an entirely different place, so we had to give up on the idea…literally an illusion.
I also returned to Postojna, to visit the main cave. They are right when they say that Slovenia is like Swiss cheese. There are more than 11.000 caves in this country and the Postojna cave is the most famous one. Can you believe it had electricity 12 years before the capital, Ljubljana, did? Amazing. The tour takes one hour and a half and the temperature is constantly 10 degrees underneath. The visit is breathtaking.