Ducks and the GPU performance? – Part 1 of 2
Is it possible to account for all the particles in the system without actually iterating through each of them every time? Read this post to get an idea about my summer project and the groundwork for the answer to the title question!
Particle interactions are a common subject of simulations in the fields like Molecular Dynamics and Astrophysics. Imagine computing the forces on each planet in our solar system caused by all the other planets. For each of the planets, one would need to add up the contributions to the total force of all the others, one at a time. Assuming all the data is available, the task is computationally trivial. However, in typical simulations, where the particles are counted in trillions, computing anything with this approach would take a lifetime even on the most sophisticated computing architectures.
A different approach is used in one of the most important algorithms of the 20th century – the Fast Multipole Method (FMM). It allows faster computation of particle interactions by grouping them together. Each group is represented by a single expression called the multipole expansion. Multipole expansion is an approximation of the impact the considered group has on its environment. When computing a force at a certain point, instead of accounting for each particle in the group, only this single expression is considered. That is how the complexity is reduced!
How to Implement Such a Thing?
FMM includes a series of steps, but for brevity, I will focus only on one for now. In real-world simulations, domains of interest are the three dimensional ones. Imagine a cubic box which I will only refer to as a box. This box is divided into a lot of small boxes where each is represented by a single multipole expansion. Consider a point in which we want to compute the potential. Consider the box in which the point is located. It is required to compute the impact of all other boxes onto our box. The impact computation is done by a so called Multipole-to-Local (M2L) operator which is exactly the segment to be improved in my summer project.
Recipe for Speed
Current implementation developed at FZ Jülich rotates the multipole twice followed by the translation into the box of interest. It can be shown that instead of the two rotations we can rotate it in a different manner six times to reduce complexity. Each of these six rotations are individually cheaper than the currently implemented two and when everything is put together, I expect to see some speed gained.
It requires some effort to grasp the maths behind creation, rotation and the shift of multipoles which makes it too broad for the scope of the blog. However, implementation-wise, it comes down to building data structures to represent these concepts and that is exactly where the fun starts! Above you can see how the fancy multipole in the first picture can be represented in the computer. Coefficients of the multipole expansion are stored in each blue box. The first box on the left represents a monopole coefficient, the two boxes stacked on top of each other hold the dipole coefficients etc.
To win exclusive bragging rights, write your idea on how the ducks are linked to the story in the comments section below! Otherwise, you can find the answer in the second part of this post. For the outcome of the new way of rotation, stay tuned for further posts!