Making the science flow

My project here in Ljubljana, Slovenia, is entitled “Visualisation support for scientific workflows with VisIt Kepler actor”. This is specific enough for anyone not knee deep in the field of scientific modelling to not have any clue as to what it is about. Hence, before reporting on my activities, it would be wise to explain what it is that I am indeed to do. I’ll start with a brief introduction of Scientific Workflows (SWFs).

As far as scientific workflows are concerned, all the calculations are black boxes – the user only cares about the input and output!
SWFs are used to automate and record data transformations, and in my case come in the form of software. For example, imagine a researcher who wishes to simulate molecular dynamics. This works in the following way: he has some initial input data (i.e. the initial positions of particles, their velocities, etc.), which he then feeds into his program. For the purpose of scientific workflows, we can treat the program as a black box – we don’t care how it performs the data transformation, or what language it is written in – all that concerns us is the output. Thus, for the molecular dynamics example we want the output to be the states of the particles after a given time period. This is an example of a very simple scientific workflow – we have transformed the initial data. However, usually more than one such transformation is required, where data is passed between different programs, often written in different languages.
The main goal of SWF is to ease scientific research. If a simulation requires multiple data manipulations (as real simulations almost always would!), one could manually “move” the data from one program to another, but not only is that tiresome, but also very prone to mistakes. On the other hand, if one first creates a workflow, the process becomes automated – all that is needed is to “feed” in the data and receive the final output, much like merging all the little “black boxes” (separate programs) into one big “black box” (the workflow), which can be “zoomed into” and edited, if desired.
Another goal of SCWF, often overlooked, is provenance. Provenance means providing evidence of reproducibility of scientific findings. Scientific work is of good provenance if it is documented in such detail that it can be reproduced without doubts. Data provenance means that is it archived, recorded how it was collected, and under what conditions transformations were done. In principle, evidence (input data) should not be contaminated in any case. All of this may sound like some definitions from archaeology, art history, bio-informatics and other fields of science where data is difficult to collect. However in computational science provenance is just as important –all the findings have to be reproducible.
Despite computing being heavily used in the business field, a major portion of it is coming from the sciences. What is more, scientific research is becoming more and more computationally intensive now – it is no longer possible for a scientist to sit in his office with a just logarithmic ruler and perform calculations. That is why workflows are an essential tool in High Performance Computing (HPC), where the submitted jobs can be very complicated and take a very long time (thus making mistakes becomes progressively more expensive, and automation of the process is essential).
Over 100 different scientific workflow products now exist, providing different functionalities, but the one I am working with is Kepler. The image shows my first workflow created in Kepler – a “Hello World!” program I wrote. What happens is the “actor” (or the black box), called HelloWorld outputs the desired sentence, and the display actor (another black box) displays that output.
Of course, scientific workflows are only a means to an end, a tool to assist scientists. In my project they will be used for visualising a tokamak fusion rector simulation. So, stay tuned for some insightful blog posts about the science behind fusion and an exploration of Kepler software!
[…] for a software written in C++? You may recall Kepler I have talked about in one of my previous blogposts – a scientific workflow software. Kepler uses actors that are written in Java. One of these is a […]