It’s been already 1 month here in Castello de la Plana, Spain for the PRACE Summer of HPC programme and one thing I have realized for sure is that the time passes really fast when you are having fun. But, what do I mean by fun? I will start explaining it by saying how I spend my weekends here. Trips to the towns next to Castello, for example Valencia, walk around the city and visiting the beach are some of the things I do after I finish work. It’s really relaxing.
But, SummerofHPC is not only about trips on weekends of course, but also working on interesting projects during the week. So, what is my project about ? It’s called ‘Dynamic management of resources simulator’ and I know that the title doesn’t say too much, that’s why for the rest of this blog post I will try to explain it.
Let’s decode the title. First, what does ‘management of resources’ mean? In HPC facilities, many applications can run concurrently and compete for the same resources. Because of this, ‘something’ has to boss around these applications by giving the resources to some applications while others are waiting. Here is where the resource manager (the ‘something’) kicks in. A program that has only one specific job, to distribute the resources to the applications based on a policy.
So far so good, now we can understand the part of the title which says ‘management of resources’. What about ‘Dynamic’? Something that is not obvious of course from the project’s title is that in this project we will deal with malleable applications, or let’s say better malleability. Malleability is nothing more than a characteristic that an application can have to change dynamically the number of its resources while it is running (on the fly). From this, we can see that an application can run with less resources by allowing more applications to run concurrently, increasing by this the global throughput. It makes sense now to have a resource manager which exploits malleability.
We have explained 80% of the title, the other 20% is the word ‘simulator’. Imagine that you want to run some applications on the HPC facility, what do you have to do ? First you have to login to the cluster, wait for the nodes to be freed (to allocate the nodes), then upload all the applications you want to run on the cluster and run them. Finally, pray to God that no-one interfered with your results. Too much effort. What about having the same behavior even on single-core machine (2018, hard to find a single-core…)? This is what the simulator is for. Let’s say that after this blog post, you get very interested in malleability and after some time you come up with an algorithm that exploits malleability. Will you try test your algorithm on the actual system? Well, of course you will do it at some point later, but first test it on the simulator without any cost.
To summarise, the purpose of this project is the introduction to the field of resource managers the concept of malleability by developing a simulator that executes a workload. By tuning different reconfiguration policies, we can determine the best configuration for fulfilling a given target without running the workload on an actual system. We are using Python and SimPy (simulation framework). We have implemented the simulator but right now we are in the phase of testing it. I am sorry that I don’t have any picture related to my project but I am writing code that I run it on my laptop, not even on an HPC system. On the other hand, since I am close to Valencia, I can share some paella…