Project reference: 1803
In this project we want to introduce the student to the field of malleability of scientific applications. We understand malleability as the capacity of a multi-process job to be resized on-the-fly, in other words, to change the number of processes (and reallocate resources) during the execution time.
For the last years, malleability has proven to be an interesting solution for high-throughput computing (HTC) and energy saving in large high-performance facilities.
For this purpose, we presented the dynamic management of resources library (DMRlib), a tool for malleability that assists developers to easily convert their applications into malleable. Furthermore, we developed a reconfiguration policy, responsible for deciding when and how job malleability has to take place.
In order to extend the prospective of malleability from other approaches, such as: resource heterogeneity, job priorities, power-awareness, we have to implement job reconfiguration policies that takes into account the information related to their goal. For this reason, in this project we propose the development of a simulator, which will simulate the execution of a workload composed of malleable and non-malleable jobs, over a parallel system.
With this simulator, we will parametrize the jobs (not all of them have to be malleable) and tune the reconfiguration policies in order to determine the best configuration for fulfilling a given target.
Project Mentor: Sergio Iserte
Project Co-mentor: Rafael Mayo
Site Co-ordinator: Maria-Ribera Sancho
The student will learn about resource management and distributed applications. Furthermore, the student will better understand how HPC production systems work during the execution of a parallel workload.
Student Prerequisites (compulsory):
Basic knowledge of Python and distributed computation models like MPI.
Student Prerequisites (desirable):
For the development of the simulator, we will use SimPy, so previous knowledge of this library will be appreciated. It is also desirable to be familiar with resource manager systems (RMS).
- Week 1: Training week
- Week 2: Literature Review Preliminary Report (Plan writing)
- Week 3 – 7: Project Development
- Week 8: Final Report write-up
Final Product Description
The result will be a modular configurable simulator, which manages jobs of different features and handles different policies for selecting resources, scheduling and reconfiguring jobs.
Adapting the Project: Increasing the Difficulty
The project can be adapted to the difficulty by implementing more/less complex policies for resource management, job scheduling and reconfiguring.
The student will have a workstation and access to the research cluster of our group in order to validate the outcomes.
Universitat Jaume I, Castelló de la Plana, Spain