### Parallel *k-*means clustering in computational astrophysics

**Location: **Niels Bohr Institutet, University of Copenhagen (UCHP), Copenhagen, Denmark

**Project Abstract:**

Most dynamical astrophysical problems can only be investigated in detail using large computer models.

Particle-In-Cell (PIC) models have gained widespread use in astrophysics as a means to understand detailed plasma dynamics, particularly in collisionless plasmas, where non-linear instabilities can play a crucial role for launching plasma waves and accelerating particles. PIC models employ often hundreds of billions of computational ‘particles’ to represent the astrophysical plasma, for example electrons, ions and photons. Although for all practical purposes collisionless on the timescale of the phenomenon in question, in certain astrophysical plasmas the need to model, explicitly, a collision term arises.

For example, in gamma-ray bursts (GRBs), high-energy electrons collide with high-energy photons – gamma-rays, both carrying energies of more than a million times that of the visible light. We know theoretically that an important scattering process in this case is *Compton scattering*.

Modeling Compton scattering in optically extremely thin plasmas, demands for particle splitting to be included in the PIC code computer model, which leads to exponential growth of the number of computational particles, and overflow in memory consumption.

To contain the exponential growth, a layer is needed in the (PIC) many-particle model which can reduce the particle number at any given time step of the simulation. Since, in this description, the particles reside in 6-dimensional space (3 spatial + 3 velocity components per particle), the problem is a genuine data compression problem in 6D.

One procedure for compressing the number of particles, is by brute force; for every N particles, a relaxation methods is needed to reduce the number to M, with N<<M. One such framework is the family of *K-means clustering* algorithms.

In order to determine the optimal way of compressing the particle ensemble in the PIC plasma models, we can resort to performing 6-dimensional K-means clustering.

The first part of the proposed project is to develop a parallel k-means clustering implementation that utilizes multi-core processes. The second part is then to develop a high performance implementation that utilizes a cluster of multi-core processes using MPI.

**Project Mentor: **Mads Kristensen

**Site Co-ordinator:** Brian Vinter

**Learning Outcomes:**

The student will learn how to:

- Parallelize a NP-hard problem for shared memory machine
- Go from shared memory to distributed memory parallelism.
- Develop using threads, OpenMP, and MPI
- Use heuristics in order to solve a NP-hard problem

**Student Prerequisites (Compulsory)**:

Basic linux skills

Elementary knowledge of parallel computing

**Student Prerequisites (Desirable):**

Experience with OpenMP and MPI

**Training Materials:**

http://en.wikipedia.org/wiki/K-means_clustering

https://computing.llnl.gov/tutorials/openMP/

**Project Application Reference:** ** **Denmark – UHCP – K-Means Clustering in Astrophysics

**Applications are now closed**