Which is the busiest node in the HPC cluster
What is a high-performance computer (HPC)? In simple terms, nowadays it is a cluster of innumerous ‘mini’ computers (nodes) interlinked by the same network. When a big calculation is submitted to the HPC, it will be fragmented into thousands of millions of mini tasks to be completed by individual nodes simultaneously. Hence the speed of the calculation is improved.
However, sometimes you have to wait for the results from one task in order to complete another task. Hence the mini tasks cannot be split as evenly as we want between all nodes. Therefore, there are always some nodes that are busier than others. But how do we know this information and make sure the scheduling process is as fair as possible? Obviously, it is impossible to schedule millions of tasks manually to optimize the process as lots of computing time is wasted – which is neither fast or economical.
Yes! My group in IT4I has foreseen the problem and started solving it by developing a package containing a C++ core with a Python interface. My job here is to develop python code to describe the traffic between different nodes visually, as shown in the figure below:
This is a network diagram showing the traffic between the server and 8 workers (0-7). The arrows represent the flow of data between each worker. The width corresponds to the volume of data. Clearly you can see there is a large amount of data being exported from W0 to W5 as well as from W3 to W6. Finally, after huge, tedious communication between all workers, they send their final results to the server and the results are finally returned to us.
In the process of making this I have compared two python modules which can generate network diagrams; NetworkX and Graph-tool. NetworkX focuses on the network analysis and network related data mining, e.g. finding the shortest path between two nodes. However, since more or less all nodes communicate with each other, there is no important shortest path for us. On the other hand, Graph-tool is good at more general data-visualisation and less focused on network analysis.
So, do you know which module I chose to draw the graph above? 😉