Project reference: 1717

Today’s multi-core, many-core and accelerator hardware provides a tremendous amount of floating point operations (FLOPs). However, CPU and GPU FLOPs cannot be harvested in the same manner. While the CPU is designed to minimize the latency of  of a stream of individual operations, the GPU tries to maximize the throughput. Even worse, porting modern C++ codes to GPUs via CUDA limits oneself to a single vendor. However, there exist other powerful accelerators, that would need yet another code path and thus code duplication.

The great diversity and short life cycle of todays HPC hardware does not allow for hand-written, well-optimized assembly code anymore. This begs the question if we can utilize portability and performance from high-level languages with greater abstractionpossibilities like C++.

Wouldn’t it be ideal, if we had a single code base capable of running on all devices, likeCPUs (Intel Xeon), GPUs (NVidia P100, AMD R9 Fury) and accelerators (Intel Xeon Phi)?

With the help of AMDs open-source HIP framework (Heterogeneous-Compute Interface for Portability) it is possible to develop one code that can run on all hardware platforms alike.

In this project we turn our efforts towards a performance-portable fast multipole method (FMM). Depending on the interest of the student, we will pursue different goals. First, the already availableCPU/GPU version of the FMM can be adapted to support HIP for the construction and efficient use of multiple sparse/dense octree datastructures. Second, a special (compute-intense) kernel for more advanced algorithmic computations can be adapted to HIP.

The challenge of both assignments is to embed/extend the code in a performance-portable way. This ensures minimized adaptation efforts when changing from one HPC platform to another.

What is the fast multipole method? The FMM is used as a Coulomb solver and allows to compute long-range forces for molecular dynamics, e.g. GROMACS. A straightforward approach is limited to small particle numbers N due to the O(N^2) scaling. Fast summation methods like PME, multigrid or the FMM are capable of reducing the algorithmic complexity to O(N log N) or even O(N).  However, each fast summation method has auxiliary parameters, data structures and memory requirements which need to be provided. The layout and implementation of such algorithms on modern hardware strongly depends on the available features of the underlying architecture.

3D projection of the future workplace of a 2017 PRACE student at JSC

Project Mentor: Andreas Beckmann

Site Co-ordinator: Ivo Kabadshow

Learning Outcomes:

The student will familiarize himself with current state-of-the art GPUs (e.g. P100/R9 Fury) and accelerators (Intel Xeon Phi ‘Knights Landing’). He/she will learn how the GPU/accelerator functions on a low level and use this knowledge to optimize scientific software for CPUs/GPUs and accelerators in a unified code-base. He/she will use state-of-the art benchmarking tools to achieve optimal performance portability for the kernels and kernel drivers which are time-critical in the application.

Student Prerequisites (compulsory):

Prerequisites

  • Programming knowledge for at least 5 years in C++
  • Basic understanding of template metaprogramming
  • “Extra-mile” mentality

Student Prerequisites (desirable): 

  • HIP/CUDA or general GPU knowledge desirable, but not required
  • C++ template metaprogramming
  • Interest in C++11/14/17 features
  • Interest in low-level performance optimizations
  • Ideally student of computer science, mathematics, but not required
  • Basic knowledge on benchmarking, numerical methods
  • Mild coffee addiction
  • Basic knowledge of git, LaTeX, TikZ

Training Materials:

Just send an email … training material strongly depends on your personal level of knowledge. We can provide early access to the GPU cluster as well as technical reports from former students on the topic. If you feel unsure if you fulfill the requirements, but do like the project send an email to the mentor and ask for a small programming exercise.

Workplan:

Week – Work package

  1. Training and introduction to FMMs and hardware
  2. Benchmarking of kernel variants on the CPU/GPU/accelerator
  3. Initial port to HIP of one or two kernel variants
  4. Unifying the octree data structure for the CPU/GPU/accelerator
  5. Implementing multi-GPU support with HIP
  6. Optimization and benchmarking, documentation
  7. Optimization and benchmarking, documentation
  8. Generating of final performance results. Preparation of plots/figures. Submission of results.

Final Product Description: 

The end product will be an extended FMM  within HIP to supportCPUs/GPUs/accelerators. The benchmarking results, especially the gain in performance can be easily illustrated in appropriate figures, as is routinely done by PRACE and HPC vendors. Such plots could be used by PRACE.

Adapting the Project: Increasing the Difficulty:

The kernels are used differently in many placesin the code. For example it may or may not be required to use a certain implementation of the an FMM operator. A particularly able student may also port multiple kernels. Depending on the knowledge level, a larger number of access/storage strategies can be ported/extended or performance optimization within the HIP framework can be intensified.

Resources:

The student will have his own desk in an air-conditioned open-plan office (12 desks in total) or in a separate office (2-3 desks in total). He/she will get access (and computation time) on the required HPC hardware for the project and have his/her own workplace with fully equipped workstation for the time of the program. A range of performance and benchmarking tools are available on site and can be used within the project. No further resources are required. Hint: We do have experts on all advanced topics, e.g. C++11/14/17, CUDA in house. Hence, the student will be supported when battling with ‘bleeding-edge’ technology.

Organisation:
Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH
JULICH

Project reference: 1718

Climate research is a major user of supercomputing facilities, however the spatial resolution of climate models are much coarser than similar models for weather forecasting, and thus they no longer scale as the computers become bigger. Instead climate research is highly dependent on increasing the time a simulation covers, into hundred of thousands of years, with a time resolution of six hours. Thus for climate research to improve the field needs faster computers, not bigger computers.

This means that accelerators are very interesting for climate research, GPGPUs, Xeon Phis, and FPGAs.

The project identifies two of computational kernels that should be ported to run on accelerators:
⦁ Ocean density solver
⦁ Barotropic solver

Global projections of surface temperature monthly mean for January 2099.

Project Mentor: Mads R. B. Kristensen

Site Co-ordinator: Brian Vinter

Learning Outcomes:

Learn the performance and programming characteristic of different accelerators particularly GPGPUs and Xeon Phis.

Identifying code that are suitable for a specific accelerator.

Student Prerequisites (compulsory): 

Basic linux skills

Elementary knowledge of parallel computing

Student Prerequisites (desirable): 

Experience with OpenCL/CUDA and OpenMP

Training Materials:

 

The memory layout of the 2nd generation Intel® Xeon Phi™ processors code-named Knights Landing (KNL):

MCDRAM as High-Bandwidth Memory (HBM) in Knights Landing Processors: Developer’s Guide

OpenCL developing guide:
http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk

Workplan:

Week 1: Training
Week 2-3: Study of numerical solvers for climate simulations
Week 4-7: Identify and port code that will benefit from accelerators
Week 8: Finalise Report

Final Product Description: 

The final product is accelerated numerical solvers for climate simulations

Adapting the Project: Increasing the Difficulty:

Include more numerical solvers and accelerators

Resources:

The student will need access to machines with:
⦁ GPGPU
⦁ Xeon Phis
Which we will provide.

Organisation:

Niels Bohr Institute University of Copenhagen

Project reference: 1719

Europe has several synchrotron facilities including, ESRF, PSI, and MAX-IV. These facilities produce still more data. Especially medical beam lines produce enormous datasets; they collect 3D volumes at a frequency that produce 3D movies. An example is volumes of 2560^3 at a frequency of 15 Hz in a 10 min experiment produce 9.000 volumes with a total size of more than 300 TB. The analysis of this data typically involves tracing one or more objects, such as a heart clap or lung tissue, in 3D over time.

Existing algorithms are typically implemented in Matlab and do not scale to the new data rates. Thus the project involves selecting a single algorithm that should be implemented in a High Performance version. Because of the large datasets, conventional processors using MPI is the most straightforward technology choice for this project.

3D visualization of the flight muscles (from http://www.nature.com/articles/srep08727)

Project Mentor: Brian Vinter

Site Co-ordinator: Brian Vinter

Learning Outcomes:

Learn different tracing algorithms and how to parallelize them using MP

Student Prerequisites (compulsory): 

Basic linux skills

Elementary knowledge of parallel computing

Student Prerequisites (desirable): 

Experience with MPI and OpenMP

Training Materials:

X-ray microscopy
http://www.nature.com/articles/srep08727

MPI
https://computing.llnl.gov/tutorials/mpi/

OpenMP
https://computing.llnl.gov/tutorials/openMP/

Workplan:

Week 1: Training
Week 2-3: Study of tracing algorithms
Week 4-7: Implement a parallel version of the tracing algorithm
Week 8: Finalise Report

Final Product Description: 

The final product is a parallelized implementation of 4D tracing

Adapting the Project: Increasing the Difficulty:

The student could explore a hybrid programming approach by implementing a version that both uses shared memory parallelism (OpenMP) and distributed memory parallelism (MPI)

Resources:

The student will need access to a MPI Cluster, which we will provide.

Organisation:

Niels Bohr Institute University of Copenhagen

Project reference: 1720

In computational biology we typically have p≥3 sets of data points P1,P2,…,Pp (e.g., sets of different biological objects including proteins, genes, drugs, diseases etc. with |Pi |=ni and for each pair i < j we have a data matrix Dij containing relations between data points from Pi and Pj (intertype connections, e.g., drugs-genes), while for i = j the data matrices Dii contain intra-type connections (e.g., gene-gene, drug-drug interactions).

A way to mine all these data sets instantly is to solve non-negative matrix tri-factorization problem, which can be formulated as follows:

Quite some effort has been devoted to solve this hard optimization problem, mainly using Fixed Point Method. We will use best existing academic code and parallelize it within C++ and OpenMPI and test it on a local HPC machine.

Results will be tested on real data from biomedicine.

Non-Negative Matrix Tri-Factorization NMF applied to data associating genes, diseases and drugs can help (i) reconstructing data matrices Dij, (ii) co-clustering the data sets, and (iii) detecting new associations (triangles of connections) between the data sets.

Project Mentor: Prof.JanezPovh, Ph.D.

Site Co-ordinator: Leon Kos, Ph.D.

Learning Outcomes:

  • Fixed point method to solve optimization problems;
  • C++ and openMPI programming on HPC

  Student Prerequisites (compulsory): 

C++ coding

Student Prerequisites (desirable): 

Basics from mathematical optimization or data science

Training Materials:

See attached paper:

  1. N. Pržulj, N. Malod-Dognin, Network analytics in the age of big data. Science, 353(6295):123–124, 2016

Workplan:

  • Week 2: study of the method and design of the development environment
  • Week 3: programming the method.
  • Week 4: testing and improving the code
  • Week 5: testing the method on real data
  • Weeks 6-7: writing the final report and the presentation
  • Week 8: Wrapping up.

Final Product Description: 

Final result will be used within a research project that is already running at the hosting institution and will be incorporated in the dissemination activities of this project

Adapting the Project: Increasing the Difficulty:

There are several extension of the underlying optimization problem which we can start working on.

Resources:

The student needs basic knowledge in C++. During the project he will get access to the local HPC machine.

Organisation:
University of Ljubljana
project_1620_logo-uni-lj

Project reference: 1721

CAD geometry is a necessary input to run a numerical simulation. However, CAD geometry from the design office is not suitable for use directly in numerical simulation. This is due to CAD complexity with many small details such as fillets, small edges and faces. Simplification and extraction of the CAD data is necessary beforehand the numerical simulation. The simplification and extraction of the CAD data will be done in open source software Open CASCADE. The numerical simulation will beperformedwith an open source software such asOpenFOAM. The main goal is to prepare the geometry for the numerical simulation in programmatic way using the Open CASCADE. With this project, a complete method of CAD extraction, meshing and numerical analysis will be done in programmatic way. Python and C++ will be used as programming tools.

Project Mentor: Marijo Telenta, Ph.D.

Site Co-ordinator: Leon Kos, Ph.D.

Learning Outcomes:

  • Utilization of HPC;
  • Using open source software
  • Utilizing the HPC resources
  • Visualization of data

  Student Prerequisites (compulsory): 

Proficiency in Bash and strong programming background in Python and C++.

Student Prerequisites (desirable): 

Experience with CFD.

Training Materials:

Front page

http://www.cfd-online.com/

http://hpc.fs.uni-lj.si/

http://www.openfoam.com

Workplan:

  • Week 1: Training week
  • Week 2: Literature review (plan writing)
  • Week 3-6: Project development
  • Week 7: Optimization, validation, and visualization
  • Week 8: Final report write-up

Final Product Description: 

Results will be published as a report and the data will be visualized and presented with figures and video materials.

Adapting the Project: Increasing the Difficulty:

More complex geometry will be analysed.

Resources:

HPC Prelog, OpenFOAM, ParaView, Python, Mathematica, Inscape, Gimp, Latex, OPEN CASCADE. The student will be placed on his own desk with desktop computer in office.

Organisation:
University of Ljubljana
project_1620_logo-uni-lj

Thursday, December 1, 11:00 am

From the left: Sergio Bernardi, Marta Čudova, Anurag Dogra, Florian Berberich, Leon Kos, Sanzio Bassini

Summer of HPC is a PRACE programme that offers university students the opportunity to spend two months in the  summer at HPC centres accross Europe. The students work using HPC resources  on projects that are related to PRACE work with the goal to produce a visualisation or a video.
At the end of the activity, every year two projects out of the 20 participants are selected and awarded for their outstanding performance.

The program of the award event includes presentations given by PRACE Board of Directors representative, by the PRACE Project PMO, and the Chair of the SOHPC award committee. The two winners will be given the opportunity to present their project and results.

Cineca organizes a meeting to present Summer of HPC, the transnational mobility program promoted by PRACE. As host of the award  ceremony Cineca wishes to present to the Italian scientific community the interesting opportunity offered by SOHPC to students who intend to spend a period of study abroad in the context of high performance computing.

Agenda:

  • Welcome speech by Cineca Sanzio Bassini. 
  • The commitment of PRACE to dissemination, Sergio Bernardi (Board of Director PRACE AISBL).
  • SoHPC project: aims, and facts, Florian Berberich Project Managment Officer of PRACE.
  • Prizegiving, Leon Kos PRACE Summer of HPC 2016 coordinator.
  • Preentations of the awarded students: Anurag Dogra, Marta Čudova.

For more details visit PRACE-RI Home page » Posts » Media » News » Announcements » Summer of HPC 2016: Awards Ceremony at Cineca

canvasIn my last blog post, I described general MapReduce paradigm which allows us to create big data processing systems in a fairly simple way. In this blog post, I will describe more in detail what I’m doing in my project,  which big data framework I have chosen and what are the motivations for the project.

We live in the era of big data. More than 50 millions of tweets are sent every day, 20h of video content is uploaded to YouTube every minute and every second almost 3 millions of emails is sent. The big data revolution does not omit scientist and researchers. For instance, the Large Synoptic Survey Telescope which is in the construction in Chile will make high-resolution photos of the sky. The quality and size of the photo will be so high that we will need 30 TB of disc space to store such image. How much data is it? Well, the human genome of 30 thousand peoples or 60 millions of books is of similar size. Do you think it is much? In 2020 it is planned to activate the Square Kilometre Array which will generate 1 048 576 TB of data every second. It is impossible for a scientist to analyze such huge amounts of data by hand, that’s why there is an increasing interest in automatic knowledge discovery techniques for scientists.

One of the ideas to make the life of scientists easier is the concept of Literature-based Discovery (LBD). In LBD we assume that a scientist is not aware of every single paper which appears in his discipline. What’s more, if some discoveries could be made at the intersection of several disciplines, then usually researcher is aware of the research only in one of them. So it seems possible (and it was confirmed by research practice), that the discoveries are just there in the huge scientific papers databases, but they are unnoticed. Let me give you a very general example. One researcher in his paper claims that A is related to B and the second researchers claims that B is somehow associated with C. However, no one is aware of the apparent connection between A and C. Such discoveries are just waiting to be found – no stroke of genius is required!

In my project, we are concentrated on the exploration of MEDLINE database of medical scientific papers. MEDLINE is the largest bibliographical database in the world and it’s freely available. Each MEDLINE entry is annotated with around 12 Medical Subject Headings (MeSH) terms which aim to describe the content of the article in a concise way. We are using MeSH terms to construct our knowledge graph and then we use linkage prediction techniques to find  new, promising connections between terms in the graph. A medical researcher seeing, for example, the system proposal of a connection between some disease and a medicine, can be inspired to perform further investigations to check if proposed drug really cures this illness. In this way, hopefully, our system will help to make a progress in medicine.

Since there are millions of possible connection in the graph, I implement the calculation in Apache Flink. Flink is an open source big data processing engine which greatly extends MapReduce paradigm. A characteristic feature of Flink is that its treats data as a data stream. Even if you work with a normal, static dataset, Flink is seeing it as a finite datastrem of a constant size. Such a view to processing data allows Flink to perform more optimizations than standard big data frameworks. In particular, Flink  does not need to wait for the termination of the previous stage of calculations to start the next ones. Since the previous stage output is a data stream then just after the first result is produced  it can be directly processed by the next stage without waiting for the rest of results.

Our dataflow modeled in Apache Flink

Our data flow modeled in Apache Flink

In our system Flink is performing all the calculations and the resulting file is then imported to a webapp called LBDream. LBDream is written in a python-based Django framework and it uses a bunch of other technologies like D3.js, Twiter Bootstrap, PostgreSQL etc. The main feature of the application is a search mechanism which allows user to explore results of our calculation. We hope to make LBDream accessible to all medical researcher around the globe soon.

Screen of a very beta version of LBDream

Screen of a very beta version of LBDream

How to MPI

So… In order to face the segmentation fault, Katerina performed changes in the allocations .. But not only in the makefiles, but into the input files too.. And that defeated the segmenation fault.

But now she should start parallelizing the code. So the first thing she needs to do is to see what should be parallelized.. When she goes through the code and sees how many routines have to be parallelized to optimize the code she gets pretty anxious.. ‘This is too much. There are thousand lines of code! I can’t make it.’ she thinks. But then, Jozef, the co-creator of this monsterous code helps her to take it easy.. ‘Start with these routines and we ‘ll see how this is working out by parallelizing one thing every time.’, he sais. And that is what she does.. Katerina figures out which loops take longer to perform the calculations included in the routines ..

‘I should better start from the outer loop. There will be less implications. And I will be probably able to cope up with the inner loops as soon as I succeed at the first one. So I should better do this:

First of all, I need to include the mpif.h into my subroutine. Hm.. As far as I can tell within this part of the code, some files are opened and closed. This should be done only on master to ameliorate the performance and make sure that no information gets lost. What else.. I should also not forget to broadcast the commons that are used in other parts of the code, since wrong values of the parameters may occur..

After that, I should write a slave subroutine, which is going to call the routine I am modifying within it.  This can be tricky.. So I should better see in what other parts of the code my subroutine is called. I have to do this to be sure that I am passing through the right variables into the slave routine. That sounds like a good plan so far.’

Of course through the process, she sees that she has to allocate into the slave routine some of the variables of the  subroutine she is trying to parallelize, because they came up to be workarrays. But that wasn’t a problem.  The monsterous code accepted these changes almost gently, since only come compilation errors occured. Even when she used MPI_Bcast both on the subroutine she is modifying and to the slave routine she wrote, through allocatable sendbufs, there were only some minor compilation errors.

She was so pleased that she kept going, thinking that everything is going to be as easy.. She modified the loop in order to be able to perform each iteration on a different processor. And after the end of each computation, the MPI_Gather command would be sending the results to the master.. But of course that didn’t turn out very well..

While the program was executed, it was hanging, even though the serial version was running (she could see that via the #ifdef ‘s she used)..

But why..?

Nanotube by the simulation

To Be Continued..

Welcome everyone. Today I will walk you step by step through how I use Blender in this project.

This post will contain many pictures, as the content is pretty graphic, but they have been slightly modified for simplicity and to keep the focus on the areas I wanted to explain. Most of them can be clicked to enlarge them.

I’m working mainly in Unity, but to generate the 3D models we need a different tool. Blender is the chosen one because of simple reasons:  free and open-source. This allowed the IT4I team to develop a custom plugin to satisfy their needs. In the next picture we can observe the basic layout split into two parts: 3D view in the left one, where we will see the 3D model once it’s generated and UV/Image Editor in the right one, where the scans will be displayed as 2D images.

blender01

The next step is to select and load a DICOMDIR, which is a file that contains all the data for the DICOM files. Those DICOM files are basically pictures obtained by Computed Tomography from a real patient.

The CT scans are oriented by the anatomical plane, which means that they are done in three axis:

  • Sagittal: Y-Z plane. It goes from the left part to the right part of the body.
  • Coronal: Y-X plane. It goes from the back part to the front part of the body.
  • Axial: X-Z plane. It goes from the lower part to the upper part of the body.

It might be difficult to picture it in our minds, so it’s easier to show it.

From left to right and top to bottom: Sagittal, Coronal, Axial

Once the DICOMDIR is loaded, many more features are available to help extract the information we need from the patient, but they will appear in the order they should be applied.

blender toolbar01– The first three buttons determine from which axis will the scans be visible in our UV/Image Editor.

– The axis sliders will help the user navigate through the scans. You can input a number directly or use it as a normal slider.

 

– The contrast and brightness sliders are self-explanatory. Sliding them to the right increases the value of contrast or brightness, and decreases it when sliding it to the left.

 

 

– Cutting the image can be useful for removing unnecessary parts from the image. It will decreases the time it takes to process the images, as it will have smaller measures.

 

– Applying blur to the image will help reduce the noise. Therefore less sketchy areas to be flooded later or smoother K-means labeling.

 

– Thresholding the image will help to nullify data from a range. To extract the lungs, some tissues with clear differences in the scan can be omitted. As the image is in grey scale, RGB has the same value in all three channels, so only one number is necessary. Checking the lung colours you can see that it never exceeds 0.16, so by setting the high threshold to this number we remove clearly unnecessary tissues, like bones per example.

 

 

 

blender toolbar02 – Applying the K-means method will basically make groups from similar parts of the images and paint every group with a different colour. The more labels, the more number of colours. A range where the K-means will be applied can be defined as well.

It’s hard to explain because I’m not an expert either, but another image will make wonders to help understanding it.

 

From left to right: 2, 4 and 6 labels

 

 

blender toolbar03

– Once the K-means is done, it’s time to select a seed (one of the colours) by pointing it and pressing Flood from kmeans. You can flood up to two seeds at the same time, and seed something inside a seed. Also, the floods store their inverse, so you could flood the lungs and you will extract everything but the lungs.

 

 

 

 

 

 

 

In the left side the seed has been flooded. In the right side the boundaries have been created.

 

 

blender toolbar04 – The next step is to create boundaries, where it will detect where are the edges from the flooded surface and will draw them.

– And the last step is to perform the poisson reconstruction. We can set here the values for depth and smoothness. Higher values will make the processing time longer, but that’s why we are using a supercomputer!

 

 

 

And voilà, a 3D model magically appeared after some delay. Tweaking the values during the process can always make it more accurate, smoother, etc. It’s a matter of playing around with the plugin for a while. This 3D model has been made for this tutorial, it doesn’t represent the outcome of the game.

Example of final result

In the next episode we will try to put everything together in Unity to leave our project ready for the presentation.

Stay tuned for more!

Featured image by Kristoffer Trolle.(CC BY 2.0 without modifications)

Assignare Locus, the computer scientists equivalent of Expecto Patronum.

AsignareLocus

This is the final blog in the trilogy, I am going to summarise the rest of the exciting experience I had in Slovenia along with a full description of the final output of my project.

I will begin with the exciting part. The project involved two key parts, Firstly to automate the Computational Fluid Dynamics (CFD) workflow that many engineers currently do manually, and secondly use this script/code to consider the effects small details in the geometric CAD model can have on simulation results.

For those of you that are thinking “CFD” that sounds like one of” them text acronyms” those pesky kids are using these days, I have attatched an image below summarising the steps that are required in an average CFD problem.

WorkFlow-4This project involved two pieces of software, OpenCASCADE, OpenFOAM and ParaView. As I mentioned in the previous blog, OpenCASCADE was used to build a series of geometric models, and subsequently OpenFOAM used to simulate the fluid motion before using paraview to look at the beautiful results. Given a picture paints a thousand words I’ll shut up and hopefully the image below will explain all the questions, you may have concerning this work flow, such as, “Wow Sam you seem like a really switched on guy with great hair and a lovely personality, please explain to me how this process is automated ?”

 

CFDFlowSam

 

All these processes are scripted, so each section requires the user to type in one command.

STEP 1 : The OpenCascade script developed builds an STL file of the geometry which is automatically copied into the correct directory ready for OpenFOAM to mesh it,(case/constant/trisurface).

STEP 2 : The Snappy Hex Meshing instructions are listed in a file (snappyHexMeshDict) and define the constraints for the boundary layers and overall mesh structure. This algorithm is then run to mesh are specified domain.

STEP 3 : The simulation is then run using a given solver that is defined in the simulation control dictionary, relevantly named “controlDict”.

STEP 4 : Look at your amazing results in ParaView.

FullDist

You have seen the results but may still be thinking, ‘I don’t really know what I’m looking at’. In many engineering problems all we need to do is solve a series of equations, in  fluid mechanics, they are known as the Navier – Stokes equations. And these equations model the pressure and velocity of the fluid.

Above are some simple Large Eddy Simulations(LES), this is a method of modelling the complex turbulence in the flow, of a range of wind barrier angles, I hope you enjoy the colours, “rainbow desaturated” my favourite. If you are thinking what the donuts is LES, as mentioned earlier we are solving the Navier-Stokes equations, however these can be quite demanding to solve, so we simplify some of the more complex “stuff” i.e. chaotic flow going on in the fluid, using a turbulence model. There are a number of turbulence models available with LES being one of the more complex.

Maybe I’m not the best at squeezing things into a word limit but I hope the above has given you an idea as to what CFD is and why we want to automate it these processes.

Just before I bid the Summer of HPC a farewell I would like to leave you with some more images of my time spent here, and to say thank you to everyone involved, including my supervisors Marijo and Leon, and everyone at LECAD lab who put up with me accidentally running a slightly large simulation on the login node and slowing everything down.

Choosing Greece and BRFAA, Zoe Cournia lab, for my PRACE Summer of HPC destination was probably one of the best decisions of my life. To be honest, the project was pretty difficult sometimes and I had a few days of complete desperation, but I can honestly say that I’ve learnt a lot and now that it’s over – was completely worth it and a great adventure. But let’s not bore you to death – I wrote a lot about the project before, and in my last post I’ve promised to tell you about the summer part of HPC program.

Giannis, Me, Juan and Juli enjoying a short break at work.

Juli & Juan B-Day celebration. The work team.

And that was an intensive one! As you would expect, Greece has a lot to offer. Being here for almost two months I didn’t experience a single day of boredom (OK…there was one Sunday I spent in bed all day. I was not bored though, actually felt pretty overwhelmed that day – very busy with getting my life back after trying ouzo the night before. ‘Let he who is without sin cast the first stone’).

I was lucky to have wonderful flatmates too! And these here are only a part of them.

Moussaka time with my flatmates.

I can not truly recommend Ouzo, did not (and will not) try Rakija, but Greeks are Masters of coffee (did you know that Frappe is from Greece?) and food. Oh my! What delicious things I’ve eaten here.  Greek salad and Tzatziki are something everybody knows Greece for, but there is a lot more than that. Among many, Tiropita, Souvlaki, my most beloved Moussaka and obviously deserts: starting from Baklava (which is basically 150% of sugar) are definitely a must when you’re in Greece. That is also amazing, that during these two months, not once did I eat lunch alone. At work – I used to eat with my colleagues, at home – with my flatmates, and during the trips – with other travel companions (which usually consisted of some of the flatmates and/or some of the colleagues).

Speaking about traveling… don’t ever think about going to Greece for a weekend trip. Not only does Greece deserve your whole vacation, but there is no possibility you will be satisfied with just 2 days here. Basing on my experience I calculated (trust me, I’m a scientist) that 7 days is the ABSOLUTE MINIMUM to see just the most essential things here.

I will not tell you about all the places I’ve been to (although they all were worth it), this would be too long to read, but I feel obliged to tell you about the most beautiful of them. Actually, I plan to come back here with my friends one day and already have my plan to show them around, and maybe any of you will find it useful too. The plan is for the minimum of 7 days, but extending it would be a piece of cake, the order changes are possible, but it is optimized for the maximum efficiency with minimum costs.

Just a random wiev on a way from Delphi to Itea

Day 1: Rent a car at the airport and skipping Athens go straight to Meteoron. That’s the farthest point you will go and although the way is pretty long (took us about 6 hours), the place is totally worth it. Absolutely unique. If you have some time during day 1 (you came with an early plane) – chill in the city and enjoy the sunset (ask locals where, they will be happy to help!) – it is beautiful there, as you can see on the picture below. These buildings on the rocks are actually the Monasteries, and you can (should) visit most of them (try going in the morning hours of the second day, remember about dressing properly, legs and arms covered). Local food alert: onion cheese pie.

we didn’t make it to the proper spot, and the sunset could be much more spectacular, not to mention my photography skills

Day 2: You are in Greece and haven’t seen any beautiful beaches yet? That’s a shame! After visiting the monasteries, go to Lamia. We don’t plan to spend too much time there – just chill an hour at the beautiful beach, get some food and move to your hotel near the next point: Delphi, get some proper sleep there, because day 3 will be a long one.

Day 3: We started early in the morning, and it was good, because later it got definitely too hot to walk around outside – and Delphi is mostly walking around with sun shining right into your face. Delphi is the center of the world, one of the most popular ancient destinations in Greece with incredible views around. After a few hours, when you’re done with the mandatory part of trip, the Oracle, seen all the temples, learnt about history and fell in love with Greece (that’s where I did that), you should definitely go to the beach in Itea and after chilling there – back to Athens. I personally recommend to stop in Chalkida on the way.

Day 4: Tired? Good. We’re on Marathon here. Day 4 is the Athens day. Breakfast, the Acropolis, Parthenon, Plaka, Theater of Dionisos, The Acropolis Museum, Lunch, Walk at Monastiraki, Ancient Agora, Syntagma square (guards change by the Parliament building is my personal favorite), sunset at Lycabettus mountain, dinner (for example here) and drink in one of many roof bars.  If you still have energy, see the night life of Athens – go to Gazi (remember, parties in Athens start at 2 a.m., not earlier).

Random View in Greece. Get used to it, because it's that beautiful everywhere here.

Day 5: Now the most beautiful part – Islands. I’ve been to two closest of them, but you can choose from variety, especially if you have more time. My choices were Aegina and Agistri. The fun part is that they’re both beautiful (if for some reasons you have to choose just one – take Agistri) and they are really close to each other. The perfect plan is to go to Aegina first (go to Piraeus and then 1 hour in a ferry), spend the whole day there, do some sightseeing, go to the beach, and the next day go to the second island (Ferry from Aegina to Agistri takes 10 minutes). On Aegina it is really worth to rent a motorbike, because the distances are too big to walk, and buses – not very often. DON’T TRY RENTING A BICYCLE (I did that) if you’re not in the shape of your life – the hills are pretty much exhausting. Local food alert: nuts.

Day 6: Next day in the morning, take 10 minutes ferry to Agistri. My personal favorite. I don’t really know if there is anything else but beaches there (ignorance…), but these are a w e s o m e. I did the island on foot/by bus and it was perfectly fine. Skala, the first beach you will see is not worth it if you’ve got only one day. Go to Chalikiada first, and then Dragonera. If there is some time left – Mariza is also one of the most beautiful. On the evening go back to Athens.

Mont Parnes

Day 7: Since you’ve been resting on the beaches and eating nuts for the last two days, you should burn these extra calories now. Day 7 is for mountains. Did you see these nice peaks around Athens? This is Parnitha. Take a bus to Thrakomakedones in the morning and let the fun begin. We went to probably the most popular Parnes, but you can choose from a wide variety of peaks. If you are too lazy to walk but still want to see the views – there is a (free) ropeway to the casino on the mountain (yes, not a hotel, not a restaurant. A casino). That is the mountain we did and I guarantee you will enjoy it. If you decide to go on foot, as we did on the beginning –remember to follow the trail and respect the mountain.

In my previous article you can find a brief introduction to the In Situ Visualization Technique which allows to explore simulation data during the run-time without involving storage resources of a supercomputer. Now it’s time to go a little bit further and focus on how to integrate in situ analysis capabilities with your own code. In this article I would like to show you what does it take and whether it is worth to put an effort into it.

Recently, there has already been done a significant work in the development of several in situ solutions that can be directly embedded into the simulation code, so there is no need to rewrite it from scratch. Here I will look closer on ParaView, a popular multi-platform scientific data analysis and visualization environment which is distributed under an open source license. While demonstrating its qualities in post-processing of extremely large datasets for a long time, thanks to Catalyst library ParaView now also belongs among currently available in situ visualization tools.

Catalyst, a relatively new component of ParaView, has been designed for fast integration with numerical codes and performing real-time analysis of generated data. It changes the traditional three-step simulation workflow. Here you first specify, which data you would like to see and analyse in situ. For this reason Catalyst uses a pipelines that are executed during the initial phase of the numerical simulation. In these pipelines, you can utilize all the post-processing capabilities which ParaView offers. In other words, you select the data which simulation produce, then apply filters such as slices, streamlines or iso-surfaces and finally choose what should be dumped for deeper investigation. This way, the output can be significantly reduced because the processed elements, which carry all the information you are interested in, are much smaller than the full datasets.

Since ParaView is built on the standard visualization toolkit VTK, the simulation internal data structures have to be transferred into the VTK data structures. This is done via so called adaptors. Adaptor is a simulation interface, which should be separated from the code, in order not to disturb it and simplify build process. At the end of the day, you have to call only three functions of the adaptor from the original code. The first one, which is called only once per simulation run, initializes Catalyst and loads pre-configured pipelines. The second one creates VTK grids, appends the computed attributes on it and dumps selected elements with frequency specified by the user. And the last function is used at the end of the simulation to release all Catalyst resources.

Example of simulation connected to Catalyst

Example of simulation connected to Catalyst. In the top-left window you can see the Catalyst sources and the datasets that are extracted to the server. Real-time results are then visualised in the main window.

Once you finish the procedure specified above and instrument your code with Catalyst, it is time to run the simulation. First you should run pvserver, a component which represents the server on which ParaView is running and process all the data. Then you can connect to that server with ParaView client and control the visualization remotely from your local machine or using visualization nodes of a supercomputer. The main advantage of doing this is that pvserver can be executed in parallel, so with enough resources you can smoothly explore even extremely large datasets. Via ParaView client you can then connect to Catalyst with port number specified in the pipelines. Once you are connected, the pvserver waits for the simulation data. The last step is to execute the simulation. Notice that the order is not important, there is no problem of connecting to already running jobs.

During the simulation, user can see the size of the datasets that simulation produces. But by default, none of the data is available on the server. The computationally expensive operations are done only on user’s demand via ParaView graphical interface. So the user can select data structures and analyse them the same way as via post-processing. But there is one difference: the simulation is running so user can observe the data as it is being generated. With Catalyst, it is also possible to pause the simulation or specify a break-point at selected time step. This can be helpful if you expect some interesting behaviour of investigated phenomena or for identifying of regions, where the numerical instability arises.

In summary, ParaView Catalyst is easy to integrate with already existing simulation code and offers really deep insight into the large amount of data corresponding to the simulated phenomena. If your code spends too much time for dumping generated data or if you contend with insufficient storage resources, ParaView Catalyst could be an elegant solution. In addition, in situ visualization is expected to enable a wide range of new interactive applications in future.

 

Grand Canal Bay, Dublin

Grand Canal Bay, Dublin

It’s the 2nd of September now, and I am back in the Czech Republic again. I left Dublin 2 days ago, after finishing the work on my Summer of HPC project in ICHEC. So, let’s have a look back now at what happened during the last few weeks in Ireland, and how my project ended up in the end.

After I created some 3D images of the shallow water simulation output using ParaView, I was still left with some time. There were several paths I could take to continue my work. One was using ParaView Catalyst for in situ visualisation, i.e. integrating ParaView directly into the simulation code, so no data copy is required between the simulation and ParaView. This would probably speed up the whole process of visualisation, since IO operations can’t be very time expensive. But there was another thing that I was far tempted to do. Two years ago, I took a computer graphic course, focused on OpenGL programming in C++. Therefore I decided to create my own visualisation tool on OpenGL, and try to push the visualisation a bit further.

I was lucky that I could reuse most of my old code from that 2 years old course, which made my work a bit easier. But I still had to adjust it in many ways, and improve it so the application could load the simulation output, and renderer it as a water surface. One of the advantages was definitely the fact that unlike ParaView, my application could easily load the Fortran binary files, which is probably the fastest way of data transfer when using files. The problematic part was to construct a mesh from the height map, i.e. organise the vertices into triplets (triangle rendering) and calculate normals of the surface, which are necessary for shading. This part is quite time consuming, and could be yet optimised in the future.

Shallow water visualisation using OpenGL

Shallow water visualisation using OpenGL

After loading the simulation output and converting it into a mesh which can be rendered by OpenGL, I moved to shader programming. Shader is in general a C++ program, which is ran on gpu. There are two basic types of shaders in OpenGL pipeline – vertex and fragment shaders. Fragment shader receives one small part of the model mesh (called fragment), information about it’s position, surface normal, texture coordinates, and other attributes, and from those, the shader calculates the result colour of this fragment. This process is executed for each fragment in the scene (in parallel, using gpu), and the result colour field is displayed on screen.

The Temple Bar pub, one of the most popular pubs in Dublin centre

The Temple Bar pub, one of the most popular pubs in Dublin centre

Water rendering has many specifics. They arise from the fact that water is transparent, and also that it has a different refractive index than air. That makes light rays passing through the water both reflect and pass through while changing their trajectory. The result effect is that we partially see the reflection of the object above the water surface, and partially we see a deformed version of what’s below the surface. How big the reflective and transparent parts are is depending on the angle of impact of the light ray on the water surface. There are two laws describing this behaviour – Snell’s law and Fresnel effect. So, it seems that there is no problem with water rendering since the physics around is well described. The problem comes with performance. If we want to render a water surface that keeps changing quickly over time, there is plenty of calculations involved and some compromises are required.

The Chapel Royal, and the Record Tower, the only part of the Dublin castle surviving from medieval times, built in 1228

The Chapel Royal, and the Record Tower, the only part of the Dublin castle surviving from medieval times, built in 1228

In my application, I didn’t get particularly far with implementing effects mentioned above, due to some technical difficulties I encountered with OpenGL. I did manage to implement basic transparency and also use Fresnel effect to modify the level of transparency based on the angle of the water surface normal towards the view angle, which is already something that ParaView doesn’t support, at least not natively, and the result a bit more realistic thanks to this. I am also very enthusiastic about improving my application, and render nicer water, even though the Summer of HPC is already over. So, in the future, I might come up with a separate post somewhere on the internet to show my progress. Water rendering is definitely something very interesting and it offers many ways to experiment and play around with visualisation and gpu parallelisation.

The old graveyard in Glendalough, co. Wicklow

The old graveyard in Glendalough, co. Wicklow

As my stay in Dublin quickly approached its end, even though being busy with finishing my work, I still dedicated some time to explore Ireland a bit more and enjoy its beauties while I could. I was lucky that I could meet up with my uncle, who has been living and working in Ireland for several years now, and who showed me around some of the most beautiful places in Dublin. We visited New Ross, a charming town to the south of Ireland. Then we drove up north, passing through Glendalough, where we visited an old Celtic cemetery. It’s difficult to describe the atmosphere at such places. The spirit of old times fills your soul with sacred astonishment, and together with gentle rain and mist covering tops of the surrounding hills makes you believe you are part of something mysterious and beautiful. It’s a completely different world compared to busy and cheerful city of Dublin. At places like this, you can find peace, balance, and lose yourself in the nature’s embrace.

Christ Church Cathedral, Dublin - one of the two main city cathedrals

Christ Church Cathedral, Dublin – one of the two main city cathedrals

Full of impressions, we drove further north. Soon we were crossing the Wicklow Mountains National Park. We drove through narrow roads bending between endless bald hills without a soul to meet. This was yet another amazing experience, as we were literally in the middle of nowhere, surrounded only by a pure and scenic nature. On top of all that, my uncle let me drive throughout the whole trip, giving me a chance to experience driving on the opposite side of the road than I am used to. To be honest, it was less of a challenge than I thought it would be, but definitely a huge fun.

Papal cross in Phoenix park, Dublin

Papal cross in Phoenix park, Dublin

During the last few days I had left, I walked a bit more around Dublin. I finally got an opportunity to enter the famous Temple Bar pub (it doesn’t require any kind of permission, but it’s totally crowded most of the time) and listen to a live band playing traditional irish music. I also visited the Dublin castle, which is located right in the center of Dublin, but still difficult to find surprisingly. One day a took a long walk through a massive Phoenix park, which, besides other things, has a massive (35m tall) cross, which was erected there in the 70s, when pope John Paul II. visited Dublin and served an outdoor mass together with about 1 million people there. This is one of the reminders that, despite having both of it’s main cathedrals belong to church of Ireland, the most expanded church in Ireland is roman catholic.

There are far more experiences and impressions I could include in this blog, but that would turn it into a small book. Those nearly two months I spent in Dublin were an amazing experience, I learned a lot, discovered even more, both work and holiday wise, and I will definitely consider visiting Ireland again.

The process of image reconstruction.Hi there!

It’s almost the end of summer holidays and the summer of HPC, and I still haven’t made to tell you everything about what I’m working on.

I was done with the CFD exercise so I started with a new one – image reconstruction. Actually, I was fiddeling with this exercise for a while during the first week within HPC and MPI workshop and implemented a serial a few parallel MPI versions of the code in C. Two weeks ago, I came back to this exercise again and re-implemented it in Python. The original application could only reconstruct an image in gray scale. I improved it so the application can also reconstruct colour images. Let’s take a look at it!

Fig.1: Colour edge detection.

The input is an image processed by an edge detector (as you can see in Fig. 1). The algorithm of reconstruction is very simple. The approach is iterative and the new pixel values are calculated using it’s neighborhood and the pixel in the edge detected image. The main aim was to parallelise the application using MPI (that’s not surprising). I started with 1D decomposition of the problem – maybe you remember my last blog post where I explained what 2D decomposition means. 1D decomposition is a bit simpler – just cut the grid (image in this case) into horizontal stripes, so that each processor can work on one stripe (see Fig. 2). You may ask me why I cut the grid into stripes just in this way. It’s related to methods for managing multidimensional arrays in a memory. Therefore we distinguish the row-major and column-major order in programming languages. Python is row-major in default (if you are using numpy arrays, you can also change the order).

 

Fig. 2: 1D decomposition

Fig. 3: 2D decomposition

Figure 2 with the Loch Ness monster clearly describes row-major order as well as the importance of using halo zones swapping to exchange data among processes during calculation (if you don’t remember what that means, see my previous blog post) – since the processes do not exchange any data, there are white stripes emerging in the image. Such a picture can’t be reconstructed properly.

I created an animation for you to illustrate how the algorithm works and how it converges to the final product. Check Fig. 4!

Fig. 4: Demonstration of how the algorithm works.

I tested several communication strategies and focused on blocking and nonblocking MPI point-to-point communications, investigated the benefits of virtual topologies and rank reordering. Fig. 5 with a plot shows how Python and C version stand in performance.

Fig. 5: Image Reconstruction Performance Results

Meanwhile in the Edinburgh – I’ve visited many new amazing places! I really love this city! Briefly and helter-skelter. Not sure if you’ve noticed but there’s a characteristic smell of the town – I find it like a smell of brewery. Seagulls, again! Those birds never sleep. I can hear them whole night. I ate a lobster (tasty, funny and a wonderful experience)! I was in North Berwick to see puffins (the small sea birds looking like small penguins). Unfortunately, it was too late because puffins are there only till the end of July. However, I enjoyed the town and its lovely beach a lot. Then, I also visited Pentlands hills – that day was very hot (real summer day, unbelievable! My nose slightly got a sunburnt). Sheep were following me on every single step. That’s not all. I visited a lot of new pubs and cafes, explored a new Sunday market with handmade stuff, clothes, food and drinks. I was in the historic Dean Village (real village in the middle of the city) and Royal Botanic garden which is quite huge and very nice place to relax. It truly worths visiting those places. And finally (but I’m sure I forget many things to tell) I took a cool silver tour in a distillery (Gaelics used to say “Slange-va” to say “Cheers”) and danced traditional Scottish dances. Everything is just pretty awesome, awesome, awesome!

Hello from Zoidberg!

New experiences – Petlands Hills, Botanic Garden, a strange and crazy girl, Zoidberg, Cramond.

Enjoy!

Marta

I feel like the second book in a trilogy is always the best, so fingers crossed this will be the best blog.

After the first week in Ljubljana, my supervisor let me loose. This meant all the hard work and training would be put into action. It began with a fun little exercise, using OpenCASCADE to pixilate 2D cuts of the Tokomak fusion reactor CAD model, to be used in the electromagnetic code EFIT++ used at the Culham Center for Fusion Energy.

FullPixel

Pixelisation and Refinement of a 2D cut of the Tokomak Casing

The images above represent the development of the pixelisation process. The first image provides the original 2D shape, the second image the first stage of pixelisation and finally the refinement stage. The goal is to keep the number of pixels in the shape as small as possible. These processes are controlled by a number of variables that can be set by the user.

I’m sure many of you aren’t convinced by the images, they aren’t going to blow you away, maybe if I had set the background to “rainbow desaturated” they could have looked a little cooler, those of you who don’t have any idea what “rainbow desaturated” is, it’s probably a good thing, I myself have spent far too much time analysing the pre-set colour schemes available in Paraview.

The work I have been doing above, has been running in parallel with the development of a automated CFD workflow. This “amazing” CFD workflow, uses a project done by Marijo Telenta, as an example test case. This project was the analysis of the geometric factors of wind barriers and how they effected the performance.

The first step was to write an OpenCASCADE script to build the wind barriers based of a number of input parameters from the user.

FullBarrier

Full Wind Barrier

The main values where the porosity, height and width of the full barrier and the height, thickness and angle of the individual bars. If at this point you are thinking this guy is a complete wally, and I have no idea what he is talking about, I have attached some images above and below that I hope will clear some things up. The image above shows the full barrier with the dimensions shown. The porisity is a variable defined by the divison of the free space in the barrier by the total available area, from a front on view.

FullBarAngle

Decription of the geometric features of individual bars for each wind barrier.

The image on the left shows a 2D cut of the different wind barrier bar angles and the image on the right provides a more mathematical description. If at this point you are still struggling, I think the best thing you can do is enjoy a nice cold beer and thank your lucky stars this is a blog, and you don’t have to listen to me try and explain it again. These geometric models will then be used in the software OpenFOAM to simulate fluid flow through them. I will provide you with some lovely images of these simulations in the next blog, mainly because I ran my simulations on the login node of the HPC, accidently I must add, so I had to close them down mid run.

I would just like to mention briefly, that my expectations of Slovenia got completely blown out the water.  Mr Lango and I have enjoyed quite the array of experiences since being here from a wonderful drive through the Slovenian mountains, to our first wine tasting and some amazing sights. I have uploaded some images that I think are quite nice and to try and encourage you to forget those “let’s say, less exciting” images you saw at the top.

CollageA

I’m not going to tell you which picture is which, I feel like it’ll be fun to try and match the picture with the description. The first ones tricky, an incredibly handsome man posing with a statue in Trieste. Vintgar Valley (Big River), outdoor movie in the castle, Ljubljana castle x2, volleyball and a lovely hike in the hills.

Along with all the activities we undertook, my roommate and I have had plenty of opportunities to sample the food in Ljubljana, and so as promised, I have uploaded a little collage of  “Lunch with Lango” or “Meals with Mattie “, really couldn’t decide which one to go for, clearly I have had too much free time.

LunchWithLango

It has been an incredible experience so far and I look foward to showing you the results of my simulations and describing the next month of my time in Slovenia

As my first blog was completely non technical I am now contractually obliged by the powers that be to talk about some technical stuff of what I am actually doing. I decided I would have a go at explaining what lattice QCD actually is and how we use it. But before I can do that though I need to first explain what plain old Quantum Chromodynamics (QCD) is and why we need to put it on a lattice. A disclaimer is in order though, as with all popular science writing done in this fashion, it is impossible to be precise and there are is a good chance some things I write might be not 100% true.

 

QCD Lagrangian

Source:  http://frankwilczek.com/Wilczek_Easy_Pieces/298_QCD_Made_Simple.pdf

 

Pictured above is the QCD lagrangian that appears in the standard model of particle physics, describing the theory of the strong interaction, it is one of three interactions described by the standard model, the other two being the weak interaction and the electromagnetic interaction. Similarly to a more familiar part of the standard model, Quantum Electrodyanmics (QED) describing the electromagnetic interaction and it’s corresponding electric charge, particles which posses “colour charge” participate in the strong interaction. Unlike electromagnetism however is that there are 3 different colour charges named red, green and blue, it should be noted that they have nothing to do with actual colour and are just a handy naming convention.

Courtesy of Cern

Source: https://cds.cern.ch/record/1473657/files/SMinfographic_image.png?subformat=

 

Of all the particles in the standard model pictured above the only ones which carry the colour charge are the quarks and the interaction carrying gluons, with there being 6 known flavours of quarks; up, down, strange, charm, bottom and top with each having a possible 1 of 3 colours as well as their antiparticles and also 8 different “mixed colour” gluons. Another striking difference between QCD and QED is the fact the gluons themselves carry the colour charge where as the photons which transmit the electromagnetic interaction do not. This “self interaction” leads to a lot of the unique characteristics of QCD. Another difference is simply how strong the interaction is compared to electromagnetism and the weak interaction, the strength of these interaction is determined by a coupling constant (the g in the first image of the lagrangian), for electromagnetism the number is approximately 0.007, known as the fine structure constant, for the strong interaction however the coupling strength is approximately equal to one for normal energies in the standard model (<2GeV) and is precisely the reason it is called the strong interaction and why we need lattice QCD.

These different properties of the theory lead to some very unique phenomenon, Confinement and Asymptotic Freedom being two. Confinement is the fact that the the force between quarks does not diminish as they are separated, in fact it grows linearly. Because of this, when you try to separate two quarks the energy in the gluon field is enough to create another quark anti-quark pair; thus quarks are forever bound in the strongly-coupled region into composite particles (hadrons) such as the proton, the neutron or the pion. Although analytically unproven, confinement is widely believed to be true because it explains the consistent failure of free quark searches, and it is also demonstrable on the lattice. Asymptotic Freedom means that at very high-energy quarks and gluons interact very weakly together (the coupling goes to zero) and are almost like free particles, this explains the commonly seen phenomenon of the narrowness of high energy jets in particle collider reactions.

As I mentioned previously coupling constant of approx 1 at normal energies, this presents a big problem for standard methods of calculation in Quantum Field Theory (QFT). Without going too in depth, calculations are typically done using ‘perturbation theory’ which basically means expressing our answer as an infinite series in powers of the coupling constant.  For QED we have a coupling constant of 0.007 so each consecutive term is significantly less important. But for QCD at everyday energies we have a coupling constant of about one so each term is just as important (in reality we can only create the series if the coupling constant is less than 1). So we turn away from perturbation theory to lattice QCD to solve our problems. Lattice QCD also helps remedy another major issue that arises in QFT, infinities, QFT’s are rife with infinities that require a range of techniques to deal. Naively these infinities arise from two separate but related infinities in the real world, the infiniteness of space (known as infrared divergences) and the fact it is continuous(known as ultraviolet divergences). Happily putting our theory on the lattice does away with both these infinities in one go. We put our whole space into a box of finite volume and we discretize the space into a finite number of points. Of course lattice QCD isn’t some magical tool and we don’t solve all our problems for free, newer problems do arise but these are manageable enough to preform calculations.

Source: http://lpc-clermont.in2p3.fr/IMG/theorie/LQCD2.jpg

Source: http://lpc-clermont.in2p3.fr/IMG/theorie/LQCD2.jpg

So using this idea as well as wick rotation we end up with something that we can actually simulate on a computer which should, in the limit that the lattice spacing goes to zero and the volume goes to infinity, give correct physical results.

Sometimes in life we have to accept dares, challenges or we even have to make self commitments to achieve success. By the way, this is human nature, after reaching one milestone we look to the next, and according to me:

Being unstoppable and having right attitude are the initial steps towards success

I’ve spent one month and 3 days, playing with ParaView Catalyst examples and trying to understand FORTRAN simulation code (given by my project coordinator), but this was the time to accept the “Dare to visualize Tornado Effect”. It’s onerous as well as interesting to understand someone else code (specially 1800 lines code) which leads you in the world of conundrums, but on the other hand you feel like Sherlock Holmes solving riddles.

The Crazy Code Conundrum

The code was using a  2D X-pencil Decomposition and Fast Fourier Transform library for dividing the 3D domain (cuboid box with dimensions 20X20X110) between the number of cores and it was used in a different way. The domain of double size was used for scientific calculations. So the first puzzle was to understand which part of the domain has the data to visualise. After brainstorming with Sandro Frigio (my project coordinator) on Skype we came to the conclusion that the data is in the first part of the domain. Then, I again checked the code and marked the places where I can make changes, to put an adapter function (the backbone for InSitu visualization) and parameters which I had to visualise. I sorted the parameters and did some adjusting in the code and it was ready for visualisation. That’s how I fixed the code puzzle.

b5

Trying to fix the code !!!

The Bright Side of Visualisation 

b4

The dark night – wrong one

It was the time of visualisation and guess what happened. As usual guys;

The bright side always shows up after seeing the dark night

When I ran the simulation code with the ParaView catalyst I saw something colorful in my black and white life. But, when I discussed with Sandro Frigio, the result was wrong. I checked everything again and again but still the result was same. Then I added some filter options in ParaView like calculator, clip, slice, and many others. In the end I saw something which looks different and seems to me correct.

 

12

The bright side – glowing Donut

It was the glowing Donut. By which I mean the FFT velocity field values near the Gaussian center (something technical).But still I was thinking it might be wrong. Luckily the day was mine and my project coordinator was happy with the Donut. The complete velocity field could only be observed if we use greater domain size. At the same time we are worried about the problem of ParaView Catalyst while dealing with a rectilinear grid without using filters like clip, slice etc. I discussed this with my colleagues over lunch and we came to the conclusion that its a minor bug with ParaView which we can be neglected.

Story about working with big domain size and facing problems.

To Be Continued……

 

b2

The Lunch time …

 

The Chill Out Time 

Friday was great, I got the correct result and was hoping to see good results with big domain size. Now its time to chill out. Petr and I visited Bologna exploring the night life,we enjoyed live music in Piazza Maggiore. Then we discovered an awesome Irish Pub, as one of my favorites and we chilled out whole night there drinking chilled Guinness and Italian beers.

b1

Enjoying Fest with Lugi Calori

After relaxing Saturday, we had an invitation from one of our site coordinators Lugi Calori of Vergato near Bologna. The place with magnificent mountains and amazing climate, is the best place to live in summer. There was a summer fest which exemplify their traditions. We had the delightful and mouth-watering food with orchestral music and as usual local Italian wine. It was a wonderful time in Vergato, thanks to Lugi Calori.

 

 

As I described in my previous posts, solving a linear system in a lattice QCD applications can be a very long process. We really want to optimize this linear solver, so we can find some answers about matter in our world.

In HPC, every time we want to obtain better performance we have two ways:

  •  Use a different hardware (for example, using a more powerful supercomputer, or increase the number of processors)
  •  Use a different software (for example, using a more efficient method)

In this project we are aiming at exploiting both.

The hardware: what is a GPU? Why do we need GPUs?

This is an NVIDIA GPU: a small dimension but a big computational power.

This is a NVIDIA GPU: a small dimension but a big computational power.

GPUs were originally created for manipulating computer graphics and 3D images. They are everywhere: in your laptops, in embedded systems, in videogame consoles and in your mobile phone.

Since images are made of pixels and each pixel can be modified separately, GPUs have an intrinsic parallel structure: each GPU has hundreds of cores that can handle thousands of threads.

It did not take long for HPC scientists to start wondering about using GPUs parallelism not only for processing images, but for scientific computing as well; GPUs are now used as coprocessors, along with normal CPUs in scientific computations. And in my project, I am using GPUs to accelerate our linear solver.

Differences between CPUs and GPUs (image taken from NVIDIA).

Differences between CPUs and GPUs (image taken from NVIDIA)

 

The software: what do Multigrid methods and Russian Matryoshkas dolls have in common?

Our physical world is continuous: it means that you can move smoothly in the space. But for computers bits a continuous world is impossible to handle.

How can we face it? A simple solution is to ‘draw’ a discrete grid in our continuous space. In this way you can use you computer to calculate, for example, the velocity of a particle at each point of the grid. You can see? You are actually solving a linear system in which velocities are the variables!

A discretized multigrid version of a bunny. Image has been taken from this article

A discretized multigrid version of a bunny. Image has been taken from this article


If you want to know the trajectory of the particle very accurately, you would draw a very big grid with many points.

More points → more variables → bigger linear systems → much more time to solve them!

With multigrid methods we build two grids: one very big (the fine grid) and one small (the coarse grid). The small one is easier to solve, but the solution is not accurate enough for our problem. We take this solution and we ‘jump’ into the fine grid: then we use this approximate solution as a starting point for solving our problem in the fine grid.

Matrioska multigrid method

Matryoshka Multigrid method


Of coarse, you can decide that your coarse grid is still too big and it is too difficult to solve it: no problem, because you can create another grid that is smaller, and then another one and then another one. At the end, you can have many grids of different sizes and you can jump from one to another (yes, like with Russian dolls called Matrioska Matryosha Matryoshka)

In practice, with a multigrid method you do the following:

1. You compute an approximate solution on the fine grid (this is only an approximation so it does not take long. This phase is called pre-smoothing)

2. You project your solution from the fine grid to the coarse one (this is the ‘jump’).

3. You solve the problem on the coarse grid, using the projected solution as a starting point.

4. You project everything back to the fine grid, and (optionally) you try to improve your solution a little bit (this is called post-smoothing)

multigrid1_illinois_university

Multigrid in practice: smooth, project, solve and project back. Repeat until you are satisfied. Image taken from the university of Illinois website

 

Conclusions

So we have the hardware (GPUs) and the software (multigrid methods). Putting these two together, it seems that our challange is finally coming to an end! Will we achieve to boost lattice QCD performance? The Summer of HPC is almost finished, and we will soon have the answer…

Having already introduced you to the technique that will be used for my project in one of my previous posts let me share with you some more things about the application.

The application will be used at outreach events and science festivals to demonstrate how a task farm shares work across many independent nodes. Users can join a compute “cluster” with their own smartphone by loading a web page and act as a node in a distributed task farm.

client_mobile

Mobile view of the worker page

The users will see how their phones contribute small pieces of work that will be joined together to create the final result on a large screen. The final result is a fractal image, constructed piece by piece from the worker devices.

At the beginning of the execution, the master chooses the parameters of the application. For example, the master can choose the algorithm that will be used (Mandelbrot Set or Julia set), the number of the iterations(how many times the equation will be calculated) or the size of every task(in pixels). Each task consists of the coordinates of a little square of the picture. By loading the web page of the application a  device can become part of the cluster and compute part of the fractal. Then the master collects the results and glues them together like a jigsaw puzzle.

Marta_final

The web page for the result visualization

Nevertheless, there are cases that this technique can’t be applied and the communication between the workers is necessary. An example of this case is the Traffic Model that is used to predict traffic flow and to look for effects such as congestion. Simulation in transportation is important because it can study models to help better plan, design and operate transportation systems. So my second application had to demonstrate this case.

traffic2

The web page for the traffic visualization

The road is divided into a series of cells, either occupied or unoccupied. In each step, cars move forward if space ahead is empty. Every user can control a different part of the road so for the boundary cases the users send messages to their neighbors to find out if the cell is occupied or not. The user can see on his screen how the cars are moving on the part of the road that he controls and the messages that he sends to the neighbours. On the master user screen, the traffic of the whole road and all the messages are demonstrated step by step.

In case you have cheched out this video, you might be wondering what this project is all about. In case you haven’t, I will wait for you 3:33 minutes to see it and return 🙂

Ok, now a little background story for the idea of the project. The visualization group of BSC has been involved with music projects and it took part in Sónar+D of 2016.

All the electronic music fans and many more know the Sonar Festival. In case you don’t, Sónar is the International Festival of Advanced Music and New Media Art taking place in Barcelona during three days every June. It has a both night and day music events, and along with Sónar by Day, is Sónar+D. Sónar+D is the international conference that brings together creativity and technology.

Sónar+D project

They know what you did!

So the great scientific visualization group of BSC-CNS was there, creating a project called “We know what you did”, by deploying at Sonar by Day a network of sensors to detect and follow wireless communication devices (like mobile phones), and making a real time analysis of how they move around the space of the festival.

Ok, but what I am doing there? Well, their love for music inspired the project for classifying artists’ discographies. Up to now, several scientific papers and methods have been proposed for classifying music genres. But what happens if we want to know more, not just about a single genre, but about an artist? Our goal was to classify similar songs into groups in order to see the variety of their music. Or if you have a specific song that you like from an artist, you can see similar works by them. Cool, isn’t it?

Beginning with the discography of an artist with a long career, we analyzed the songs extracting features that describe them. Then,after we perform a data analysis, we were able to group songs by similarity. The stages can be seen from this image:

The stages involved for Discography classification

The stages involved for Discography classification

The audio extraction is used to get audio feature sets that are evaluated in their ability to differentiate. The feature sets include low-level signal properties, mel-frequency spectral coefficients (MFCC). Low-level signal parameters refer to a physical description of a song. MFCC is commonly used in voice recognition and is based on human hearing perceptions.
At the end, each song is described with 72 features.

Not all these information is useful and at the end, only a small percentage of them contributes to the final result. We use Principal component analysis (PCA), a statistical procedure used to emphasize variation and bring out strong patterns in a dataset, meaning that if a feature’s value doesn’t change a lot, then it’s not useful.

By keeping 99% of the original information, sometimes we ended up having only 2(!) useful features.

 

Now that our data is pretty, clean and tidy, we move to clustering. Cluster analysis combines data mining and machine learning. What we try to do is grouping similar songs. Songs in the same group (called a cluster) are more similar to each other than to those in other groups (clusters).

But how do we know which is the best number of clusters? Science is here to save the day, using a metric called silhouette width, which indicates how well the clusters are formed. Silhouette width has a range [-1,1], and a value close to one means that items inside the groups are very similar among themselves, and groups are well separated.

David Bowie clusters using the K Means clustering algorithm. The clusters are very merged and not a clear data structure was found

David Bowie clusters using the K Means clustering algorithm. The clusters are very merged and not a clear data structure was found

But at the end, all these are just mathematics. Clustering methods lack the intuition and human inspection is necessary on how the clusters are formed and making a determination based on an understanding of what the data represents, what a cluster represents, and what the clustering is intended to achieve.

For performance and to exploit the potential of MareNostrum, Barcelona’s supercomputer, we used PyCOMPSs to parallelize the audio extraction stage. The BSC has developed COMPSs, a programming model and runtime that aims to parallelize sequential applications written in sequential programming languages (like C, Java, Python). It makes parallelism easy even for those that don’t have a strong programming background! Although in our case it was not necessary because of the small number of songs per artist, PyCOMPSs could also be used in the PCA and clustering stages.

We better understand data when we see them. The data is brought to life using web technologies like HTML, CSS and javascript library D3, producing dynamic, interactive data visualizations in web browsers. The vizualization is the result of the cluster analysis, using a default coloring, but giving also the user the possibility of defining their own clusters.

David Bowie. Default coloring

David Bowie. Default coloring

Pink Floyd. The wall (1979)

Pink Floyd. The wall (1979)

You can see the results and play more with the artists here! Vizualization result

 

Follow by Email