The project I have laid my hands on goes under the fancy title of

Hybrid Monte Carlo Method for Matrix Computation on P100 GPUs

Now most will understand the ending, but not the beginning of the above. Thus I’ll endeavor to decipher the statement.

In principle the main goal of the project is to accelerate the computation of a so-called preconditioner using Markov Chain Monte Carlo methods. The latter will then be used to accelerate the solution of linear systems of equations by means of iterative methods.

“Monte Carlo” already suggests some meddling of chance in the process but the remainder of the statement is still cryptic. Still, all shall be explained in due course. First a brief digression:

### Linear systems of equations

The workhorse of scientific and everyday computing, whether in computer games or a simulation of the universe, *they are everywhere.*

A linear system is a set of equations, as depicted in the figure on the right where the unknowns (*x,y,z*) appear untainted by any function (i.e., not as *x*^{2}). In the figure one can also see the process of transforming a system of equations into a shorthand notation using matrices and vectors – something one may remember from high school. The matrix-vector formulation is more common in science.

Every linear system can be solved, in principle, using the well-known Gaussian elimination procedure. The method is very expensive, requiring ~*N*^{3} arithmetic operations (with *N* being the number of equations) and is thus far too expensive for many applications. In general, engineering and scientific problems yield linear systems with millions or billions of equations. Solving them using direct methods would slow the progress of technology to a crawl.

Hence one resorts to iterative methods. Methods which compute, step-by-step, an increasingly refined approximation of a solution using basically only matrix multiplications. Since matrices from real problems are often sparse (i.e., most of their entries are zero), the cost of iterative methods is a lot less than for direct methods.

### Preconditioners – the why and how.

Alas! *There is no such thing as a free lunch*. The drawback of iterative methods is that the number of steps to reach an acceptable solution depends on the system matrix (A in the above image). The role of a preconditioning matrix is thus to accelerate the convergence of an iterative method. To achieve this it should approximate the inverse of the system matrix well and be easy to compute.

Suffice it to say that the study of preconditioners is difficult enough to warrant a dedicated field of research in mathematics. Many simple and elaborate preconditioners already exist.

### Our place at the frontier of science

Since the problem of convergence is not a new one, much has been done in the field of preconditioning already. Generally, however, the preconditioners suffer from bad parallelizability – an undesirable illness in today’s world of high performance computing.

The method which I shall implement in due course may be used to compute a rough approximation to an inverse matrix to be used as a preconditioner. Furthermore, the method is cheap and being a Monte Carlo method, should in principle scale well, i.e., the computation should require less time if more computers are used.

To achieve the desired acceleration, theoretical analysis and practical implementation utilizing the enormous computational resources of the GPU will have to go hand-in-hand.

The random component we will utilize has a twist, however:

### Markov Chain Monte Carlo

**What is a Markov Chain?** For now, the example in the accompanying figure shall suffice. It is a graph (a collection of nodes and edges) where each node represents some abstract state and the edges provide the possible ways (and their probabilities) for a state to change.

We can compute a solution of a linear system essentially by creating a Markov chain (read: graph) from the system matrix, picking a state at random and performing a random walk on the graph. That is, starting in a state at each time step (i.e., every minute) we may choose some path (edge) available to us, flip a coin and based on the outcome and the weight of the edge decide whether we go along or not. If we do this for a sufficiently long time and use a certain formula we can compute entries of the preconditioner based on the nodes we visit.

A (hopefully) more satisfying explanation, will have to wait for a follow-up post.

### Summa-summarum

**The goal is clear:**

- Accelerate iterative methods by creating a method which is able to quickly compute a good preconditioner using Markov chain Monte Carlo implemented and optimized for the NVIDIA P100 GPUs.
- Compare the performance of the resulting implementation with other state-of-the-art methods.
- Investigate additional stochastic methods with the same goal in mind.

[…] algorithms. Business hasn’t been too good lately, mostly because of all these new competing faces on the scene and because libraries like OpenCL make it too easy to write parallel code for all […]