In my previous post I discussed Job Scheduling algorithms, now I will tackle how we are approaching working with them.
In my project, the NEXTGENIO research group (which is part of PRACE) have made a HPC system simulator that allows us to implement different job scheduling algorithms and test their effectiveness, without having to use ARCHER (Edinburgh’s supercomputer), which is expensive to use. It is not a finished project yet and there is still some software engineering to go through, output features to be added and tests to be done. It also had to be made compatible on MAC OS as it had been developed in LINUX, which I did in my first two weeks, this just involved some software engineering.
My supervisor wanted a selectable output feature to be added to the project. Two output types that were developed at the Barcelona Supercomputing centre called OTF2 (Open trace format) and Paraver (browser for performance analysis) were desired to be chosen between to show the output. This has been implemented and been written so that other types can be added to the project too with relative ease.
Initially, the simulator only had the First-Come-First-Serve algorithm. In this project we strive for better performance, so let’s look at some other algorithms. First off, the score-based priority algorithm which sorts jobs according to scores where we incorporate a fair share scheduling weight to adjust score based on the total number of compute nodes requested by the user, the number of jobs they are running, their recent history and the fraction of their jobs completed. Next, we have multi-queue priority which incorporates numerous queues with different levels of priority and there are certain conditions required to be in each queue. Finally, we have backfilling. Here we opportunistically run low priority jobs when insufficient resources are available for high priority jobs. That’s our list of job scheduling algorithms which sort the order of the job waiting queue. The scheduling algorithm runs on job events such as when a job starts, finishes, aborts or arrives and if there are no events in the past 10 seconds it runs anyway.
The task of mapping algorithms determine which compute nodes to run the job on. The goal is to minimise the communication overhead and reduce cross-job interference. Random mapping is considered the worst case scenario. Round-robin keeps the nodes in an ordered list and when a job ends the node is appended to the list. Over time, the fragmentation of the list becomes significant and the communication overhead drastically increases. For Dual-End we set a threshold value for time which groups every job into short or long. Short jobs search for unoccupied nodes from one end of the list, long jobs search the other end.
Each job has a priority value P(J) with the wait queue being ordered in terms of highest priority. The priority is the sum of five different heuristics. Minimal requirement specifies details like the number of nodes. Aging helps avoids job starvation, and is shown in detail on the right, where age factor is a multiplicative factor of choice. Deadline maximises the number of jobs terminated with a deadline. License heuristics gives a higher score to jobs requiring critical resources on nodes such as NVRAM which hasn’t been implemented yet. Response minimises the wait time for jobs with the shortest execution times by a boost value, which is the backfilling component of this algorithm. This paper goes into good detail about backfilling.
I’ve been making good use of my time off. First, Jakub and I went up to explore the Royal Observatory on Braid and Blackford Hill beside where we work. The temperatures have been so high here that there were fires on top of the hill and firetrucks were needed to put them out. The observatory was closed when we went up but I think we will revisit and see if we can get to the top.