K-Means Clustering in Astrophysics

Parallel k-means clustering in computational astrophysics

Location: Niels Bohr Institutet, University of Copenhagen (UCHP), Copenhagen, Denmark

Project Abstract:

Most dynamical astrophysical problems can only be investigated in detail using large computer models.

Particle-In-Cell (PIC) models have gained widespread use in astrophysics as a means to understand detailed plasma dynamics, particularly in collisionless plasmas, where non-linear instabilities can play a crucial role for launching plasma waves and accelerating particles. PIC models employ often hundreds of billions of computational ‘particles’ to represent the astrophysical plasma, for example electrons, ions and photons. Although for all practical purposes collisionless on the timescale of the phenomenon in question, in certain astrophysical plasmas the need to model, explicitly, a collision term arises.

For example, in gamma-ray bursts (GRBs), high-energy electrons collide with high-energy photons – gamma-rays, both carrying energies of more than a million times that of the visible light. We know theoretically that an important scattering process in this case is Compton scattering.

Modeling Compton scattering in optically extremely thin plasmas, demands for particle splitting to be included in the PIC code computer model, which leads to exponential growth of the number of computational particles, and overflow in memory consumption.

To contain the exponential growth, a layer is needed in the (PIC) many-particle model which can reduce the particle number at any given time step of the simulation. Since, in this description, the particles reside in 6-dimensional space (3 spatial + 3 velocity components per particle), the problem is a genuine data compression problem in 6D.

One procedure for compressing the number of particles, is by brute force; for every N particles, a relaxation methods is needed to reduce the number to M, with N<<M. One such framework is the family of K-means clustering algorithms.

In order to determine the optimal way of compressing the particle ensemble in the PIC plasma models, we can resort to performing 6-dimensional K-means clustering.

The first part of the proposed project is to develop a parallel k-means clustering implementation that utilizes multi-core processes.  The second part is then to develop a high performance implementation that utilizes a cluster of multi-core processes using MPI.

Rendering of internal structure of a gamma-ray burst, looking into the shock front. This is taken from a recent supercomputer simulation and is uniquely modeled employing a particle-in-cell model containing several billion particles.

Rendering of internal structure of a gamma-ray burst, looking into the shock front. This is taken from a recent supercomputer simulation and is uniquely modeled employing a particle-in-cell model containing several billion particles.

Example of exponential growth in particle number as a burst of photons propagates through a quiescent circumburst medium (plasma)

Example of exponential growth in particle number as a burst of photons propagates through a quiescent circumburst medium (plasma)

Project Mentor: Mads Kristensen
Site Co-ordinator: Brian Vinter

Learning Outcomes:

The student will learn how to:

  • Parallelize a NP-hard problem for shared memory machine
  • Go from shared memory to distributed memory parallelism.
  • Develop using threads, OpenMP, and MPI
  • Use heuristics in order to solve a NP-hard problem

Student Prerequisites (Compulsory):

Basic linux skills
Elementary knowledge of parallel computing

Student Prerequisites (Desirable):

Experience with OpenMP and MPI

Training Materials:

http://en.wikipedia.org/wiki/K-means_clustering
https://computing.llnl.gov/tutorials/openMP/

Project Application Reference:  Denmark – UHCP – K-Means Clustering in Astrophysics

Applications are now closed

Tagged with: , , ,
Follow by Email