Cude colors on a phine grid

Project reference: 1716

Simulations of Lattice Quantum Chromodynamics (the theory of quarks and gluons) are used to study properties of strongly interacting matter and can, e.g., be used to calculate properties of the quark-gluon plasma, a phase of matter that existed a few milliseconds after the Big Bang (at temperatures larger than a trillion degrees Celsius). Such simulations take up a large fraction of the available supercomputing resources worldwide.

 These simulations require the repeated computation of solutions of extremely sparse linear systems. Different methods are being used to solve these systems,  Krylov-based solvers such as CG or FGMRES or multi-grid methods (MG), which combines a generic solver  one a fine with another solver on a coarsened grid (lattice).

Depending on personal preference, the student will be involved in tuning and scaling the most critical parts of a specific method, or attempt to optimize for a specific architecture in the algorithm space.

In the former case, the student can select among different target architectures, ranging from Intel XeonPhi (KNC/KNL), Haswell (AVX2) or GPUs (OpenPOWER), which are available in different installations at the institute. To that end, he/she will benchmark the method and identify the relevant kernels. He/she will analyse the performance of the kernels, identify performance bottlenecks, and develop strategies to solve these – if possible taking similarities between the target architectures (such as SIMD vectors) into account. He/she will optimize the kernels and document the steps taken in the optimization as well as the performance results achieved.

In the latter case, the student will, after getting familiar with the architectures, explore different methods by either implementing them or using those that have already been implemented. He/she will explore how the algorithmic properties match the hardware capabilities. He/she will test the archived total performance, and study bottlenecks e.g. using profiling tools. He/she will then test the method at different scales and document the findings.

In any case, the student is embedded in an extended infrastructure of hardware, computing, and benchmarking experts at the institute.


Project Mentor: Dr. Stefan Krieg

Site Co-ordinator: Ivo Kabadshow

Learning Outcomes:

The student will familiarize himself with important new HPC architectures, Intel Xeon Phi, Haswell, and OpenPOWER. He/she will learn how the hardware functions on a low level and use this knowledge to optimize software. He/she will use state-of-the art benchmarking tools to achieve optimal performance for the kernels found to be relevant in the application

Student Prerequisites (compulsory): 

  • Programming experience in C/C++

Student Prerequisites (desirable): 

  • Knowledge of computer architectures
  • Basic knowledge on numerical methods
  • Basic knowledge on benchmarking
  • Computer science, mathematics, or physics background

Training Materials:


Paper on MG with introduction to LQCD from the mathematician’s point of view:

Introductory text for LQCD:


Week – Work package

  1. Training and introduction
  2. Introduction to architectures
  3. Introductory problems
  4. Introduction to methods
  5. Optimization and benchmarking, documentation
  6. Optimization and benchmarking, documentation
  7. Optimization and benchmarking, documentation
  8. Generation of final performance results. Preparation of plots/figures. Submission of results.

Final Product Description: 

The end product will be a student educated in the basics of HPC, optimized kernel routines and/or optimized methods. These results can be easily illustrated in appropriate figures, as is routinely done by PRACE and HPC vendors. Such plots could be used by PRACE.

Adapting the Project: Increasing the Difficulty:

  1. Different kernels require different levels of understanding of the hardware and of optimization strategies. For example it may or may not be required to optimize memory access patterns to improve cache utilization. A particularly able student may work on such a kernel.
  2. Methods differ greatly in terms of complexity. A particularly able student may choose to work on more advanced algorithms.


The student will have his own desk in an open-plan office (12 desks in total) or in a separate office (2-3 desks in total), will get access (and computation time) on the required HPC hardware for the project and have his own workplace with fully equipped workstation for the time of the program. A range of performance and benchmarking tools are available on site and can be used within the project. No further resources are required.

Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH

Please follow and like us:
Posted in Projects 2017 Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *