# The Fast Multipole Method: A Coulomb solver for the masses

There is time for laughs and there is also time for being serious and getting things done. In this blog post I would like to summarize what I have learned about the Fast Multipole Method during this programme. This theoretical introduction is not meant to be exhaustive, nor 100% mathematically proven. The focus was placed on providing a simple explanation of the general functioning of the algorithm so that its high-level workflow can be understood.

“Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it’s my very good honor to” try to explain this, although you may hate me in the end. Take a seat, coffee, enough food, fasten your seat belts and please do not smoke. If you are not in the mood for reading but you enjoy listening to this kind of stuff, I recommend that you watch my video in which I explain the concepts in a friendly manner:

#### Introduction

The simulation of dynamical systems of particles, usually under the influence of physical properties, such as gravitational or electrostatic forces, is a crucial issue in scientific research. This problem is commonly referred as the N-body problem. The main obstacle of that problem is the absence of an analytical solution when the number of bodies, i.e., N , is greater than three total bodies. However, those simulations can be solved using an iterative numerical approach. Therefore, those methods compute the total force exerted on each particle and their potentials at discrete time steps, then compute the resulting velocities and update the positions of the particles accordingly.

A typical example is the simulation of a system of particles with electric charges $q_i$. The force $\boldsymbol{F}_{ij}$ of a particle $j$ with charge $q_j$ acting on a particle $i$ with charge $q_i$ is defined by the following expression:

$\boldsymbol{F}_{ij} = \displaystyle\frac{q_iq_j}{|\boldsymbol{r}_{ij}|^3}\boldsymbol{r}_{ij}\,,$

where $\boldsymbol{r}_{ij}$ is the distance vector between particles $i$ and $j$.

Given that, the total force $\boldsymbol{F}_i$ acting on each particle $i$ can be expressed as the following summation:

$\boldsymbol{F}_{i} = \displaystyle\sum_{j=1}^N \displaystyle\frac{q_iq_j}{|\boldsymbol{r}_{ij}|^3}\boldsymbol{r}_{ij} \quad (i \neq j)\,.$

As we can observe, calculating the forces acting on a single particle has a computational complexity of $\mathcal{O}(N)$ since we have to compute all pairwise interactions of the current particle with the rest of the system. Therefore, a naive algorithm for computing all forces $\boldsymbol{F}_i$ exhibits $\mathcal{O}(N^2)$ complexity.

The remainder of the simulation steps have a complexity of $\mathcal{O}(N)$ since computing the velocities by using the forces just needs to iterate once over each particle and the same applies for the position update step. In this regard, the quadratic complexity may be negligible for a small number of particles, but $\emph{interesting and useful simulations}$ often involve huge particle ensembles so the simulation will be considerably slowed down to a point in which it is non-viable to apply this kind of summation method. Fortunately, due to the increasing importance of N-body simulations for research purposes, fast summation methods have been developed throughout the latter years.

Arguably, the most remarkable one is the $\emph{Fast Multipole Method (FMM)}$, introduced by Greengard and Rokhlin [1].

This method is considered $\emph{one of the TOP10 algorithms of the past century}$ [2] together with other notable ones as the $\emph{Monte-Carlo method}$, the $\emph{Quicksort algorithm}$ or the $\emph{Fast Fourier transform}$. The mathematician Barry Cipra perfectly summarized the functioning of the algorithm [3]: “This algorithm overcomes one of the biggest headaches of $N$-body simulations: the fact that accurate calculations of the motions of $N$ particles interacting via gravitational or electrostatic forces (think stars in a galaxy, or atoms in a protein) would seem to require $\mathcal{O}(N^2)$ computations. The fast multipole algorithm gets by with $\mathcal{O}(N)$ computations. It does so by using multipole expansions (net charge or mass, dipole moment, quadrupole, and so on) to approximate the effects of a distant group of particles on a local group. A hierarchical decomposition of space is used to define ever-larger groups as distances increase.”

In this post, I would like to provide a brief description of the FMM. We will review its core aspects and its mathematical foundations. We will also describe briefly the role of the mathematical operators used for shifting and converting the multipole expansions. At last, we will explain how the application of rotation-based operators to those expansions is able to reduce the complexity of the aforementioned operators from $\mathcal{O}(p^4)$ to $\mathcal{O}(p^3)$.

As we previously mentioned, the FMM is a fast summation method which is able to provide an approximate solution to the calculation of forces, potentials or energies within a given precision goal, namely $\epsilon$. The method exhibits linear computational complexity, mainly thanks to its sophisticated algorithmic structure. The core aspects of the FMM are the following ones: spatial grouping of particles, hierarchical space subdivision, multipole expansion of the charges and a special interaction scheme.

#### Spatial Particle Grouping

The main idea behind the FMM can be informally described as the following intuitive concept: the effect that particles, which are close to the observation point (also named target), have over that target particle is dominant compared to the effect produced by remote particles. It is important to remark that the contribution of those remote particles is not zero. Otherwise, we will be resorting to what is known as cut-off scheme. Those methods present a $\mathcal{O}(N)$ complexity, but lack the error control capabilities of the FMM. That is the main reason why cut-off based methods are not an option for accurate simulations.

Consider the particle distribution shown in Figure 1, in which we want to compute the remote interactions between two different clusters: target, with $m$ particles, and source, with  $n$ particles. The effect of intra-cluster interactions are not taken into account in this example for the sake of simplicity. A traditional summation method requires computing $m \cdot n$ interactions. The FMM is based on the idea that a remote particle from a spatial cluster will have almost the same influence on the target particle as another one from the same cluster, given that the inter-cluster distance is large enough. This means that all particles in the remote cluster can be grouped together into a pseudo-particle. By doing this, we are effectively reducing the amount of interactions to $m$, since each target particle is tested against the remote pseudo-particle once. This grouping scheme can also be used in reverse, by grouping the target cluster thus requiring $n$ interactions. In addition, we can also group both source and target clusters, reducing the computation to a single interaction.

FIgure 1. (a) direct interactions of the particles of one cluster with all particles in the other. (b) interaction via source pseudo-particle. (c) interaction via target pseudo-particle. (d) interaction with both source and target pseudo-particles.

#### Hierarchical Space Subdivision

In order to implement the spatial grouping idea, it is needed to subdivide the simulation space to generate particle groups. The FMM uses a space decomposition scheme based on a recursive decomposition in cubic boxes. Firstly, the simulation spatial domain is enclosed in a three-dimensional box. Then, the cube is subdivided by planes parallel to its faces which pass through the box centroid. This decomposition technique generates eight different child cubes which are recursively subdivided. This hierarchy of cubes is arranged in a tree, namely an octree. The space is subdivided until a predefined depth $d$ on that tree. Figure 2 shows an example of this recursive subdivision visualized on a 2D plane.

Figure 2. Space subdivision using an octree.

#### Special Interaction Scheme

As we previously mentioned, the FMM takes advantage of grouped remote particles to reduce the amount of interactions to be computed. This interaction scheme is the main characteristic of the FMM algorithm. Figure 3 shows the interaction matrix of a distribution of $16$ particles using the direct summation method and the resulting interaction matrix by using the FMM.

Figure 3. (a) interaction matrix of 16particles (blue bullets) using a direct summation scheme. (b) depicts the interaction matrix via the FMM scheme, the dark-greyed cells represent direct neighbors. Each square of a matrix represents a particle-particle or multipole-multipole interaction. Crossed out cells represent interactions from particles with themselves which are not calculated due to the underlying singularity. (c) depicts the direct interaction of a 1D distribution of 16 particles (first line) and the different FMM interaction sets.

As we can observe, the direct summation scheme computes a total of $N(N-1)/2$ interactions. This is due to the matrix symmetry and the dropped out self-interactions. On the other hand, in the FMM scheme the source and target particles are grouped as the distance increases thus dropping the number of interactions significantly.

The immediate neighbors of a certain cell or box are also called its near field and their contributions to the target particle are calculated with the classical direct summation method. The FMM handles a parameter called well-separateness and referred as $ws$ that represents the maximum distance (in boxes) to the target box for a cell to be considered part of its near field, i.e., it defines the size of the direct neighborhood. In this regard, for an arbitrary value of $ws$, there are $(2\cdot ws + 1)^3 - 1$ boxes in the near field of any other box. Figure 4 shows different examples of near fields for a box with non-periodic boundaries.

Figure 4. Near fields (gray boxes) for a box A with varying ws.

#### Multipole Expansion of Charges

In order to accurately depict the charge distribution of the created particle groups, the FMM leverages to multipole and local expansions. A group could be represented by reducing all the charges to a common point, but this is not precise enough to represent the charge distribution inside that cluster. Instead of doing that, the group is represented by a series of increasingly higher-order multipoles. The series is truncated to ta certain order $p$, this parameter controls the final accuracy and balances the computational load.

The solution for the potential $\Phi$ of a particle $P$ at the spherical coordinates $(r, \Theta, \phi)$ consists of two separate solutions: a multipole expansion with moments $\omega_{lm}$ for a local part inside the sphere, and a Taylor-like expansion with moments $\mu_{lm}$ for the outside.

The multipole expansion is defined by the following theorem: suppose that $k$ particles of charges $q_j$ with $j = 1, \dots , k$ are located at the points $\boldsymbol{a}_j = (a_j, \alpha_j, \beta_j), j = 1, \dots , k$ with $|\boldsymbol{a}_j| < \hat{a}$ inside the sphere. Then for any $P = (r, \Theta, \phi) \in \boldsymbol{R}^3$ with $r > \hat{a}$, the potential  is given by:

$\Phi(P) = \displaystyle\sum_{l=0}^{p}\sum_{m=-l}^{l}\omega_{lm}(q,a)M_{lm}(r)$

The Taylor-like expansion is defined by the following theorem: suppose that $k$ particles of charges $q_j, j = 1, \dots, k$ are located at the points $\boldsymbol{R}_j = (r_j, \Theta_j, \phi_j), j = 1, \dots, k$ with $|r_j| > \hat{a}$ outside the sphere. Then for any $P = (a, \alpha, \beta) \in \boldsymbol{R}^3$ with $a < \hat{a}$, the potential is given by:

$\Phi(P) = \displaystyle\sum_{l=0}^{p}\sum_{m=-l}^{l}\mu_{lm}(q,r)O_{lm}(a)$

#### Workflow

Once the core aspects of the FMM have been reviewed, we can put them all together to establish the general workflow of the algorithm. Given a certain separateness criterion $ws$, the multipole order $p$ and the depth of the tree $d$, the FMM consists of the following steps:

• Pass 1: Expand particles into multipole moments $\omega_{lm}$ on the lowest level and translate multipole moments $\omega_{lm}$ up the tree.
• Pass 2: Transform multipole moments $\omega_{lm}$ into local moments $\mu_{lm}$.
• Pass 3: Translate local moments $\mu_{lm}$ down the tree.
• Pass 4: Compute far field contributions: potentials $\Phi_{\text{FF}}$, forces $\boldsymbol{F}_{\text{FF}}$, and energy $E_{\text{FF}}$.
• Pass 5: Compute near field contributions: potentials $\Phi_{\text{NF}}$, forces $\boldsymbol{F}_{\text{NF}}$, and energy $E_{\text{NF}}$.

The actual phases of the algorithm are usually named $\emph{passes}$. The first step is performed by the Particle-to-Multipole (P2M) operator and it is often considered a preprocessing step. The remainder of the first pass corresponds to the Multipole-to-Multipole (M2M) operator, while the second one is done via the Multipole-to-Local (M2L) operator and the third one with the Local-to-Local (L2L) operator.

#### Mathematical Operators

As we previously mentioned, the FMM needs three fundamental mathematical operators during its workflow, namely M2M, M2L, and L2L. Those operators are responsible for shifting the multipole expansions up and down the tree levels, and also to convert remote multipole expansions to local ones at each level. In this section we will briefly review those three operators to provide context for the rotation-based operators.

##### Multipole-to-Multipole

The M2M is a vertical operator which shifts the multipole coefficients up to higher levels of the tree structure. Each box of the 3D tree has eight child boxes in the next lower level. The M2M operator sums up all the moments of the multipole expansions of the child boxes at the center of the parent box. This operator is applied to each level up to the root of the tree. By doing this, each box on every level has a multipole expansion. Since this operator is applied for the first pass, it is also known as $A$.

From a mathematical perspective, each child multipole expansion $\omega^i$ at the center $a_i$ of that child box $i$ is shifted up to the center $a + b$ of its parent box (see Figure 5. The moments $\omega^{i}_{jk}(a_i)$ of each child multipole expansion are shifted by the $A$ operator to produce the moments $\omega^{i}_{lm}(a_i + b_i)$ of the parent’s expansion:

$\omega_{lm}^{i}(a_i + b_i) = \displaystyle\sum_{j=0}^{l}\sum_{k=-j}^{j}A_{jk}^{lm}(b_i)\omega_{jk}^{i}(a_i)$

In addition, all the shifted moments of the eight child boxes are finally added up to conform the multipole expansion at the center of the parent box:

$\omega_{lm}(a+b) = \displaystyle\sum_{i=1}^{8}\omega_{lm}^{i}(a_i + b_i)$

Figure 5. M2M operator. Diagram (a) shows the analytical domain of the A operator on a 2D tree. The centers of a sample child box and the parent one are shown. Diagram (b) depicts the functioning of the M2M operator for a 2D system.

##### Multipole-to-Local

The M2L is an horizontal operator which transforms multipole expansions on a level of the tree into local ones on the same level. The multipole coefficients $\omega_{jk}(a)$ of an external expansion around $a$, are converted into local or Taylor-like coefficients $\mu_{lm}(b-a)$ around $(b-a)$. The multipole moments of the remote expansion are transformed into local ones by means of the M2L operator, also named $B$:

$\mu_{lm}(b-a) = \displaystyle\sum_{j=0}^{\infty}\sum_{k=-j}^{j}B_{jk}^{lm}(b)\omega_{jk}(a)$

Figure 6 depicts the M2L operator. It is important to remark that this operator presents a truncation error due to the need for limiting the first summation range. This is usually constrained so that $j. Additional errors are introduced but analyzing this aspect of the mathematical operators is outside the scope of this work. The reader may refer to [4] for error analysis of the FMM operators.

Figure 6. Diagram (a) shows the analytical domain of the B operator on a 2D tree level. Diagram (b) depicts the functioning of the M2L operator for a 2D system.

##### Local-to-Local

The L2L is a vertical operator which shifts local expansions from a parent box to the centers of its children boxes thus acting over different levels of the tree. The Taylor-like moments $\mu_{jk}(r)$ of a parent box with center at $r$ are shifted down also as local moments $\mu_{lm}(r-b)$ to the center of a child box $(r-b)$. The mathematical definition of this shifting which makes use of the L2L operator, also known as $C$, is defined as follows:

$\mu_{lm}(r-b) = \displaystyle\sum_{j=l}^{p}\sum_{k=-j}^{j}C_{jk}^{lm}(b)\mu_{jk}(r)$

Figure 7 shows the functioning of this operator over a 2D tree, and its analytical domain. This operator also introduces error due to the finite representation of the local expansion (notice that the first summation is limited to $p$ poles). It is also important to remark that no additional errors are introduced, so the M2L or $B$ operator becomes the most error-prone one.

Figure 7. Diagram (a) shows the analytical domain of the C operator on a 2D tree structure. Diagram (b) depicts the functioning of the L2L operator for a 2D system.

##### But, complexity…

All the described operators have a computational complexity of $\mathcal{O}(p^4)$. This fact is considerably harmful when dealing with high-precision simulations in which the order of the multipoles is usually large. Taking into account that the three passes take a considerable amount of the total execution time of a FMM step, it is critical to reduce the cost of the mathemathical operators in order to achieve faster simulations.

#### Rotation-based operators

A set of more efficient operators with $\mathcal{O}(p^3)$ computational complexity scaling were proposed by White and Head-Gordon[5]. The reduced complexity is achieved by rotating the multipole expansions so that the translations or shifts are performed along the quantization axis of the boxes (see Figure 8). This reduces the 3D problem to a 1D one.

Figure 8. Rotation-based operator steps. Step one, (a) The vector R between two boxes A and B. Step two, (b) The coordinate system of the box B is rotated to align it along the z’-axis defined by the quantization direction. Step three, (c) The multipole expansion is translated into a local expansion around the center of A. Step four, (d) The local expansion is rotated back to the original coordinate system.

Along the quantization axis, $\theta=0$ and $\phi=0$ so the operators take an especially simplified form [6]. In order to rotate the multipole or local expansions, Wigner rotation matrices are applied. Two successive rotations are performed: first one about the $z$-axis with a rotation angle $\phi$, followed by another one about the new $y$-axis with a rotation angle $\theta$ (see Figure 9). Those angles correspond to the polar and azimuthal angles of the shift vector $\boldsymbol{d}=(d,\phi,\theta)$, depicted as $R$ in Figure 8.a and 8.c.

The multipole moments of an expansion with respect to a coordinate system which has been rotated twice, first and angle $\phi$ about the $z$-axis and then $\theta$ about the $y$-axis, can be expressed as a linear combination of the moments with respect to the original coordinate system:

$\omega'_{l,m}(a,\alpha,\beta) = \displaystyle\sum_{k=-l}^l \displaystyle\frac{\sqrt{(l-k)!(l+k)!}}{\sqrt{(l-m)!(l+m)!}} d_l^{m,k}(\theta)e^{ik\phi}\omega_{l,k},.$

The same applies for the local moments $\mu'_{l,m}$, expressed as a linear combination of the moments $\mu_{l,m}$ with respect to the original coordinate system:

$\mu'_{l,m}(r,\delta,\gamma) = \displaystyle\sum_{k=-l}^l \displaystyle\frac{\sqrt{(l-m)!(l+m)!}}{\sqrt{(l-k)!(l+k)!}} d_l^{m,k}(\theta)e^{ik\phi}\mu_{l,k}\,.$

In both equations, $d_l^{k,m}$ are the Wigner small $d$-rotation coefficients whose computation falls beyond the scope of this work. A detailed explanation and implementation on how to compute them can be consulted at [6]. The term $e^{ik\phi}$ represents a factor that is needed for each moment to compute the rotation. On a normal implementation, the operator will have to compute both terms on the fly, adding a pre-factor to the $\mathcal{O}(p^3)$ complexity of this operators. However, that pre-factor can be removed by precomputing and reusing the constants.

Figure 9. Rotating the coordinate system (x,y,z) into (x’,y’,z’) for the rotaiton-based operators. Figure reproduced from [6].

And that’s all folks! I hope you find this brief introduction to FMM interesting! All the figures were proudly designed and generated from scratch using LaTeX/Tikz.

#### References

[1] Leslie Greengard and Vladimir Rokhlin. A fast algorithm for particle simulations. Journal of computational physics, 73(2):325–348, 1987.

[2] Jack Dongarra and Francis Sullivan. Guest editors introduction to the top 10 algorithms. Computing in Science Engineering, 2(1):22–23, Jan 2000.

[3] Barry A Cipra. The best of the 20th century: editors name top 10 algorithms. SIAM news, 33(4):1–2, 2000.

[4] Ivo Kabadshow. Periodic boundary conditions and the error-controlled fast multipole method, volume 11. Forschungszentrum Jülich, 2012.

[5] Christopher A. White and Martin Head-Gordon. Rotating around the quartic angular momentum barrier in fast multipole method calculations. The Journal of Chemical Physics, 105(12):5061–5067, 1996.

[6] Simon Pfreundschuh. Implementation of a rotation operator for the fast multipole method. Technical report. Forschungszentrum Jülich, 2014.

EOT.