Multithreading the Multigrid Solver for lattice QCD
Project reference: 1806
This project will involve optimising lattice Quantum Chromodynamics (lattice QCD) codes, which currently run on PRACE Tier-0 and other European Peta-scale supercomputers. Lattice QCD is a method to study Quantum Chromodynamics, the theory which describes how quarks bind to form the protons, neutrons and all other hadrons which make up the visible mass observed in the universe. Within lattice QCD, one simulates the complex strong interactions, one of the four fundamental forces of nature, directly from the underlying theory.
Using lattice QCD, important fundamental questions can be addressed, such as how the quarks are distributed inside protons and neutrons and what fraction of their intrinsic spin, momentum and helicity is carried by the quarks and gluons. These hadron structure calculations require large computer allocations and run on the world’s largest supercomputers.
Multigrid solvers have enabled simulations of QCD using physical quark masses, a milestone in the field that allows direct contact with experiment and input to searches for new physics. In this project, the software package “DDalphaAMG” will be optimized for novel architectures, including KNLs and Skylake. DDalphaAMG is a state-of-the-art solver library that implements an algebraic multigrid algorithm for an arbitrary number of levels. Its scalability however, is limited by the inefficient threading model currently employed. This project will involve optimizing DDalphaAMG to accommodate future HPC trends that are evolving to architectures with wide vector registers and hundreds of threads per node.
A first step of the project will include an analysis of the scalability of the code which will introduce the selected student to the code and the environment. The performance when using different parallelization strategies, such as OpenMP combined with MPI, will be quantified. In the final phase, the student will optimize kernel functions by exploiting vector intrinsics and/or optimizing the OpenMP parallelization according the previous analysis.
Multigrid methods efficiently solve large sparse linear systems of equations by first “smoothing” the matrix to be solved and subsequently solving a ?coarsened? system as a preconditioner of the original, finer system.
Project Mentor: Constantia Alexandrou
Project Co-mentor: Giannis Koutsou
Site Co-ordinator: Stelios Erotokritou
The summer student will perform runs on PRACE Tier 0 systems and will familiarize themselves with using world-leading supercomputing infrastructures. The student will be trained on exploiting different parallelization strategies and optimizations for modern computing architectures. The training will include an introduction to lattice QCD techniques.
Student Prerequisites (compulsory):
Undergraduate degree in Physics with grade above average and good programming experience.
Student Prerequisites (desirable):
Knowledge of Theoretical High Energy Physics; Experience with parallel programming, Experience with OpenMP and vectorization programming.
- Lattice Gauge Theories-Introduction, Heinz J Rothe,
- World Scientific Quantum Chromodynamics on the Lattice, Gattringer, Lang, doi 10.1007/978-3-642-01850-3, Springer-Verlag Berlin Heidelberg
- Lattice Quantum Chromodynamics, Knechtli, Günther, Michael and Peardon, DOI: 10.1007/978-94-024-0999-4, Springer Netherlands
- Week 1
In the first week, the summer student will attend the PRACE SoHPC training program.
- Week 2-4
From the second to the fourth weeks, the summer student will familiarize themselves with the code and will perform first runs on the target machines. This includes compiling of the software, optimization of some model parameters and an introduction to multigrid solvers in lattice QCD by local experts of the group.
- Week 4-7
In the remaining weeks, the summer student will perform OpenMP/MPI performance analysis and begin optimizing the kernel functions.
- Week 8
Refinement of results and prepare final presentation. Hand-over code to local researchers.
Final Product Description:
The final project result will be an optimized and more scalable code. The results will be provided in a report which summarizes the performance analysis and optimizations carried out.
Graphics will be provided to demonstrate the improvement in performance and scalability of the code, and diagrams will be used to visualize the optimizations carried out.
Adapting the Project: Increasing the Difficulty
The intensity/difficulty of the project can be increased by targeting more advanced optimization strategies for the kernel functions.
Adapting the Project: Decreasing the Difficulty
The intensity/difficulty of the project can be decreased by extending the first phase of the project (parameter space exploration). Here, more weight can be given to parameter optimization for specific lattice sizes.
The summer student will be provided access to a PRACE Tier-0 systems equipped with Intel Skylake and KNL processors, including SuperMUC, Marconi, and Jureca, including its booster partition.
In addition, the student will be given access to local infrastructure that includes a small cluster of Xeon Phi (KNC) processors that can be used for prototyping of the kernel functions.
The Cyprus Institute – Computation-based Science and Technology Research Center