Benchmarking the Dirac Operator (Part 1)
In this blog post, I will explain the fundamental theory behing simulating fermions on the lattice. In particular, I will discuss the application of staggered fermions in creating a kernel for the Staggered Dirac Operator using the Kokkos C++ template library.
Fermions on the Lattice
Firstly, let us say a few words about the setup from the perspective of Lattice Quantum chromodynamics (Lattice QCD). Instead of using the continuum description of 4-dimensional spacetime, we instead choose to represent spacetime as a 4-dimensional hyper-cubic lattice, with each lattice point n having integer coordinates nμ, μ = 0,1,2,3.
Quark fields φ are defined on every lattice x, which are described using Dirac spinors φ(x). Component wise, the spinor is denoted as ψαi(n) with the Dirac/”spin” index α = 1,2,3,4 and gauge/”colour” index i=0,1,2,…, N-1, with N being the dimension of the gauge group. For Lattice QCD, the gauge group is SU(3), therefore i=0,1,2. Thus, each spinor φ(n) can be thought of as having 12 independent components that are Grassmann variables.
Applying the staggered transformation of the quark fields, the action is diagonalized in the spin indices, therefore splitting it into 4 parts (1 for each Dirac index). Therefore, in the staggered action, we can reduce the quark field into a 3 component vector ψi(n), with complex entries.
To couple the quarks fields to a gauge field U, we introduce directed links, which connect a site n with a site n+̂μ. Thus, associated to each φ(n) are link variables Uμ(n) along the ̂μ direction emanating from site n. These link variables are elements of the gauge group, therefore for Lattice QCD are 3×3 complex matrices. In component form this is Uμab(n). Using these, we can write the Staggered Dirac Operator D acting on a quark field φ as
As we can see in the above equation, for each lattice site n we perform 8 matrix-vector multiplications, Uφ (8 as we have eight nearest neighbours in a 4-dimensional hypercube). Therefore, in the Kokkos kernel, we must perform these multiplications over N lattice sites to get an output quark field χ(n).
With Kokkos, we can make many choices about how we perform these operations in parallel. The choice of execution policies is critical in maximizing bandwidth. In the next blog post I will post some results from runs performed at the Jülich Supercomputing Centre.
 C. Gattringer, C.B. Lang, Quantum Chromodynamics on the Lattice, Springer Berlin, Heidelberg, 2010