An introduction to Kokkos

An introduction to Kokkos

So, what is Kokkos?

Modern high performance computers have diverse and heterogeneous architectures. For applications to scale and perform well on these modern architectures, they must be re-designed with thread scalability and performance portability as a priority. Orchestrating data structures and execution patterns between these diverse architectures is a difficult problem.

Kokkos programming model
The Kokkos programming model, broken into constituent parts. [Source: kokkos.org]

Kokkos is a C++ based programming model which provides methods that abstract away parallel execution and memory management. Therefore, the user interacts with backend shared-memory programming models (such as CUDA,OpenMP, C++ threads, etc.) in a unified manner. This minimizes the amount of architecture-specific implementation details a programmer must be aware of.

So, how is this done?

Kokkos defines 3 main objects to aid in this abstraction: Views, Memory Spaces and Execution Spaces.

Views: A templated C++ class that is a pointer to array data (plus some meta-data). The rank/dimension of the view is fixed at compile-time, but the size of each dimension can be set at compile-time or runtime. Each view also has it’s own layout, either column-major or row-major. This depends on which memory space the view stores it’s data on. (Eg: the host CPU, CUDA etc.).

The view dev allocated to the CUDA memory space. A shallow copy of the metadata is made from the default host space (CPU). [Source: Kokkos Lectures, module 2]

Execution Space: This is a homogeneous set of cores with an execution method, or in other words a “place to run code”. Examples include Serial, OpenMP, CUDA, HIP, etc. The execution patterns (such as Kokkos::parallel_for) are executed on an execution space, usually using the DefaultExecutionSpace, which is set at compile time passing the -DKOKKOS_ENABLE_XXX parameter to the compiler. However, one can change the space on which a pattern is executed as patterns are templated on execution spaces.

Memory Space: Each view stores it’s in a memory space, which is set at compile time. If no memory space is passed to the view, this is set to the default memory space of the default execution space.

This is only a very brief overview of the main concepts of Kokkos. A lecture series exploring these concepts and much, much more is available here.

What has been done so far?

The main aim of this project has been to explore the use case of Kokkos for Lattice Quantum Chromodynamics simulations, mainly developing staggered fermion kernels and comparing to handcrafted code for specific architectures. Working benchmarks of the staggered action and staggered Dirac operator have been developed. As Kokkos is based on shared memory, the current stage of development has been to extend these kernels to work on multiple nodes and GPUs using MPI.

Tagged with: , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.