Scaling the Dissipative Particle Dynamic (DPD) code, DL_MESO, on large multi-GPGPUs architectures
Project reference: 1913
DPD is a stochastic particle method for mesoscale simulations of complex fluids. It is based on a coarse approximation of the molecular structure of soft materials, with beads that can represent large agglomerates or unions of simpler molecules like water. This approach allows to avoid extreme small time and length scales when compared to classical Molecular Dynamic solvers but retaining the intrinsic discrete nature of matter. However, realistic applications often require a very large number of beads to correctly simulate the physics involved. So here comes the need of scaling to very large systems using latest hybrid CPU-GPU architectures.
The focus of this work will be on benchmarking and optimization on novel supercomputers an existing multi-GPUs version of the DL_MESO (DPD) code; a discrete particle solver for mesoscale simulation of complex fluids. Currently, it has been tested up to 2048 GPUs and needs further development for good scaling on larger systems as well as for improving its performance per single GPU. Moreover, the code has been tested only on simple cases, like binary fluid mixture separation and needs a robust evaluation with realistic applications, like phase separation, solute diffusion, and interactions between polymers. These often require extra features, like Fast Fourier Transform algorithms, currently not implemented and usually representing a main challenge for scalability on novel architectures.
The student will have a minimum task of benchmarking the current version, modify the current limiting factors for scaling on large supercomputers and run performance analysis to identify possible bottleneck and relative solutions to improve speedup. According to s/he experiences, further improvements on the HPC side as well as new features for complex physics could be added. In particular, a plasma of electrically charged particles will be used as a benchmark where Ewald Summation based methods, like the Smooth Particle Mesh Ewald, have to be implemented.
Project Mentor: Jony Castagna
Project Co-mentor: Vassil Alexandrov
Site Co-ordinator: Luke Mason
Participant: Davide Di Giusto
The student will learn to benchmark, profile and modify multi-GPUs code mainly written in Fortran and CUDA languages following typical domain decomposition implemented using MPI libraries. S/he will also gain a basic understanding of the DPD methodology and its impact on mesoscale simulations. The student will also gain a familiarity with proper software development procedure using Software for Version Control, IDE and tools for Parallel Profiling on GPUs.
Student Prerequisites (compulsory):
Good knowledge of Fortran, MPI and CUDA programming is required as well as in parallel programming for distributed memory.
Student Prerequisites (desirable):
Some skills in being able to develop mixed code such Fortran/CUDA will be an advantage as well as experience in multi-GPU programming using CUDA/MPI.
These can be tailored to the student once he/she is selected.
- Week 1/: Training week
- Week 2/: Literature Review Preliminary Report (Plan writing)
- Week 3 – 7/: Project Development
- Week8/: Final Report write-up
Final Product Description:
The final product will be an internal report, convertible to a conference or better journal paper, with benchmark and comparison of the improved version of the DL_MESO multi-GPUs.
Adapting the Project: Increasing the Difficulty:
The project is on the appropriate cognitive level, taking into account the timeframe and the need to submit a final working product and 1 reports.
Adapting the Project: Decreasing the Difficulty:
The topic will be researched and the final product will be designed in full but some of the features may not be developed to ensure a working product with some limited features at the end of the project.
The student will need access to multi GPUs machines, standard computing resources (laptop, internet connection).
Hartee Centre – STFC