Performance visualization for bioinformatics pipelines

Project reference: 1715
Supercomputers can help speed up the drug discovery using machine learning. Within the project the student will work with our tool that deploys new programming model and scheduler for running complex pipelines in a distributed environment on an HPC cluster. The goal of the project is to bring performance information from workflow pipelines to the users. The student’s work is to bring more detailed information about performance of the pipeline to the users via own visualization or exports to a tool for performance analyses.
Attached picture shows protein and directed acyclic graph of small cross-validation example
Project Mentor: Jan Martinovič
Site Co-ordinator: Karina Pešatová
Learning Outcomes:
Basic knowledge of processing and visualization of results from performance analysis of machine learning pipelines used for example for drug discovery
Student Prerequisites (compulsory):
Basic programming skills in
- C, C++
- Python
Student Prerequisites (desirable):
- 2D data visualization
- Parallel processing
Training Materials:
http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/
C++: http://www.cplusplus.com/doc/tutorial/
Python: https://www.python.org/doc/
Workplan:
Week 1: Training
Week 2: Work plan setting
Week 3 – 6: Implementation of software tool for performance information extraction and visualisation from workflow pipelines by own implementation or exports to a tool for performance analyses
Week 7: Visualization of the results
Week 8: Final report completion and final presentation preparation
Final Product Description:
visualisation from workflow pipelines
Adapting the Project: Increasing the Difficulty:
In the original setting, the project is focused on post-mortem analysis. The complexity can be increased by providing the visualization in realtime.
Resources:
Software
- Python
- C, C++ programming environment
Hardware
- Salomon cluster
Access to the appropriate software and hardware will be provided by the IT4Innovations National Supercomputing Center.
Leave a Reply