High-level Visualizations of Performance Data
Project reference: 1816
In the world of parallel programs, performance plays one of the key roles. There are mature tools (e.g. Scalasca) that enable to inspect the performance of an application, do analysis, and based on this, judge its performance. The aim of this project is to use the results obtained from such an application and visualize them at a higher level of abstraction. Specifically, to visualize the performance data within an abstract model of communication that will be provided.
The student is going to develop a tool that takes as inputs, a performance data and abstract model of communication, and produce a view on the data upon the communication’s model. It is important to note that the data collected from an application’s run can hold two fundamentally distinctive information; statistical data or trace logs (data measured in time). The tool will be able to visualize both kinds of this information. As for the statistical data, it highlights the parts where the application spent most of its time or which path(s) were most used, etc. The visualization of trace logs, on the other hand, will allow the user to see different states in time; a replay of the trace log. Because the statistical information can also be obtained from trace logs, it can be interesting to visualize of their development in time; “cumulative replay”.
Project Mentor: Jan Martinovič, Ph.D.
Project Co-mentor: Martin Šurkovský
Site Co-ordinator: Karina Pesatova
The student will get acquainted with measuring and performance analysis of parallel programs, plus get an insight to data visualization.
Student Prerequisites (compulsory):
Good level of programming skill, experience with developing GUI applications and processing different file formats.
Student Prerequisites (desirable):
Knowledge of C++ and MPI (point-to-point is enough), and a basic idea about Petri nest, especially Coloured Petri nets.
- Message-Passing Interface standard (v3.1, point-to-point communication)
- Coloured Petri nets (http://www.springer.com/gp/book/9783642002830)
At the beginning, the student should get acquainted with Scalasca tool set, its possibilities, and with standard ways of visualization event-based data. During the second week they should prepare the basic components the tool is going to be composed of, particularly loading and processing input data. After that the visualization of statistical information is in the row (1 – 1.5 week). The rest of the time should be devoted to work on visualizing event-based data (trace logs) and experimenting with visualization of different kinds of data.
Final Product Description:
A tool allowing a user to see performance data collected from a parallel program’s run in the context of its abstract model of communication.
Adapting the Project: Increasing the Difficulty:
The difficulty of the project can be increased by improving the quality of information the user can see, e.g. not only that data are presented but also a more detailed view on the data. The other possibility is to visualize results of some analyses performed in Scalasca, e.g. critical path.
Adapting the Project: Decreasing the Difficulty
The resulting tool will be with a GUI, hence there is a space for decreasing the difficulty. Moreover, in the worst case, it will be enough to focus just on statistical data and omit the visualization of trace logs.
- Materials provided with Scalasca toolset (www.scalasca.org, especially user guide and data-format’s description).
- The tools from the area of process mining, ProM (http://www.promtools.org) or Disco (https://fluxicon.com/disco/) as a source of inspiration for data visualization.