It is a great pleasure for me to tell how I feel about training week and the organization itself before jumping into the technical details of my project. The first day of training week was like a great orientation day, in which you listened to great presentations as well as meeting peers with lots of chatting. In my opinion, it is so important to provide everything to make us do not feel the limitations of the online meeting. It was a lovely introductory day to the program where each part (both students and organizers) had a chance to reflect their expectations. The effort PRACE gave to make the meeting as interactive as possible is so respectful. The utilization of ZOOM meeting was very good. We jumped to the breakout rooms and met with peers and then came back to the main room and followed presentations again. Moreover, plenty of surveys and a visual collaboration workspace MURAL is used to make us involve better. The rest of the training weeks were composed of MPI, OpenMP, and CUDA reviews. Apart from the great content, when I looked into the schedule, optional stretching and yoga sessions put a smile on my face. Slides and hands-on exercises were well prepared and almost no pre-knowledge is assumed.

with my Summer of HPC T-Shirt

Later in the week, I met with my teammate and supervisor. Busenur Aktilav is a senior Computer Engineering student in IYTE and Buket Benek Gürsoy is a computational scientist in the Irish Centre for High-End Computing. We are all Turkish and coming from similar cultures allows sharing more things and I am looking forward to working with them! In the group meeting, our supervisor introduced us to Kay, the supercomputer that we are going to work on. I previously worked on Gandalf, the supercomputer at Sabanci University. When I switched to Gandalf from my old loyal laptop, I was amazed by the properties of Gandalf. However, now when I compare it with Kay, I see the adjective “super” is a very subjective word. Kay is the biggest computer that I have ever worked on! Hopefully, at the end of our project “GPU acceleration of Breadth-First Search algorithm in applications of Social Networks” I can tell “This is the real HPC” and the adjective “real” will not be as subjective as “super”.

From the theory to the practice.

Hello, my name is Aitor Lopez, I am 23 years old and I come from Spain. I’m a recent graduate student in Mathematics and Computer Engineering and I’m starting a master’s degree in artificial intelligence and machine learning, because I want to combine the mathematical potential with the latest computer technology that mixes both.

I consider myself a person who likes theoretical sciences, such as abstract mathematics and computer science. But I feel a weakness towards problem-solving, so I like to participate in different algorithm competitions, like google hash code or hackathons. I think that facing great challenges in this type of competition can help improve your qualities and bring out the best in you.

For all this, I want to learn about HPC to be able to accelerate different algorithms and thus complement the theory with the practice. To achieve this I’m going to participate in the PRACE Summer of HPC, in the High-Performance Quantum Fields project in Jülich Supercomputing Centre, Germany. In this project, I’m going to run several algorithms in quantum fields in different hardware architectures to measure their performance.

QCD & HPC

SoHPC has already started and we have successfully completed the training week, which was conducted online due to COVID. During this week, we learned the basics about HPC and parallel programming techniques, like MPI or OpenMP on CPUs and on CUDA on GPUs. We have also had the opportunity to visit the VSC supercomputer infrastructure available in Vienna online, where we have executed our exercises. And meet many students like me who are looking to learn new things and overcome new challenges. The only thing left to do is to put them into practice in the project.

If you want to know more about me or about what I’m going to do, don’t forget to check my blog weekly. See you soon!

When mt T-shirt has just arrived
When my T-shirt has just arrived

Hello everyone! My name is İrem Naz and I am 23 years old. I join this programme from Turkey/Izmir. Also I am studying computer engineering at Dokuz Eylül University. Next term will be my last year at my school.Apart from my school life, I enjoy doing sports (especially pilates,yoga), handcrafted painting (eg stone painting, renovating items) and spending time with my friends. SoHPC programme, I was selected “Implementation of Paralllel Branch and Bound algorithm for combinatorial optimization” project that located in University of Ljubljana.Are you ready to learn more about my participation in the program and about me? Fasten your seatbelts !

The Application Process

Maybe you haven’t decided what to do right now for your future, or you don’t know how to get started. I was exactly at this position and I wanted to take a step.I was wondering what to do in a bigger world with my interest in the parallel and analysis lessons I took at school. While drowning in these thoughts,I learned the Summer HPC program. I applied, thinking it would be a perfect start. As I read the blogs where participants shared their experiences in recent years, I believed that this decision was correct. Because this program is a summer camp where there is plenty of information, solidarity and adventure.The fact that it will be online for pandemic reasons this year has not returned me from my application in addition I was very excited when I found out that my application was accepted and I couldn’t wait to share it with my loved ones.

The Training Week

This week, which was held online this year, was the week of learning new information for me like MPI,OpenMPI,CUDA.As someone who notes everything, I can say that I spend a lot of paper.In addition to this, it was a week full of yoga and body movements between the activities and lessons held for us to meet the participants.In this way, we met my project friend Carlos and our mentors told us about the plans for the coming weeks. Now I say goodbye to you with a hope to meet again with more about my project !

#
In training week #summerofyoga

“Take a look at this, it may interest you…”. I found these words in an email my thesis supervisor sent me last year. Attached to that there was a link, it brought me to the homepage of SoHPC. I started reading about it and I felt more and more intrigued the more I went on. After some thinking, I decided not to participate but I promised myself to try the next year. And that’s exactly what I did, even if things didn’t go completely as planned… To cut the story short, I didn’t expect to write this blogpost in my hometown here in Italy, after a week of remote learning. But, as someone would say, when life gives you lemons, make lemonade.

Wait, I still haven’t introduced myself! I’m Roberto, a last-year student of the master’s degree in Computer Science and Engineering at Politecnico di Milano. But I think that what I do is more interesting than who I am: High-Performance Computing is my field, and I take all the opportunities I find to explore it more. My master thesis is an example of this: I’m working on Fault mitigation and tolerance for MPI applications. If you are interested in knowing more about it here you can find a research proposal that describes the topic and, if not yet satisfied, feel free to contact me, I will be glad to talk about it.

I decided to work on fault tolerance also here at SoHPC. The project bases on Charm++, a parallel programming framework which already features fault tolerance. We (me and my mate Petar, also check out his blog) will have to improve the fault tolerance mechanism present, adding more features. Here you can find more details about it if you are interested.

Well, you may be interested in what I do outside High-Performance Computing. In my free time, I play videogames and watch TV mostly. Some evenings I play board games with some of my friends, and we always have a great time. I always try to squeeze some time to do some physical exercise to keep myself fit, and sometimes I even go running. I’m preparing for a running competition organized by my university: 10km distance across the streets of Milan, and I want to improve my last year’s performance of 56 minutes.

I’m not a big fan of social networks, but I use GitHub and LinkedIn. If you don’t want to miss anything about what I do, feel free to add me. Moreover, you can find a lot of projects I worked on in the past on GitHub, some of them may interest you. Those explore a wide variety of fields, from functional programming to hardware design, from High-Performance Computing to some strange telegram bot, so I doubt you won’t find anything interesting…

That’s all about me I think, I’ll just attach a picture of my hometown, Verbania. It’s a small city on the shore of Lake Maggiore, in the northern part of Italy. It’s not Vienna nor Edinburgh, but it’s still a beautiful place.

Verbania
Verbania, my hometown

Instead of canceling in pandemic times programme decided to double number of participants (50 selected for 24 projects) and organise remote training and project mentoring. All applicants welcomed such decision with comments like:

  1. Indeed we are lucky that computing is one of the things we can still do in this extraordinary time. I am still very enthusiastic about taking part in SoHPC!
  2. I am currently working remotely for college so continuing to do so is not an issue. My whole family is working remotely as this is and will be the new normal method of working for the foreseeable future so I fully support the decision to switch the project to a remote one
  3. I’m not happy about that but I understand that Covid-19 is a serious threat.
  4. I think it would be the best moment to learn more about programming as staying at home might be recommended or even drafted. Also, now I am getting used to this type of work as my final degree projects require cluster and local programming. Furthermore, I am habituated in communicating with partners and tutors weekly as we work wih the same framework and data.
  5. Remote it not a problem, also it has some good sides. I really want to take part in this program.
  6. Those are the times of unfortunate events but it’s also good for academic improvoment since we all stay at home and focus on study. I’d happily like to attend SoHPC distant education. Creating a group of students is also good for starting new friendships and academic network. I hope I can be part of this great organization. Best regards.
  7. While it is a bummer the program cannot continue as planned, given the recent developments, I fully understand the need to make this decision and I am on board with it.
  8. I would be more than happy to participate in the programme remotely, if selected.
  9. Despite the exceptional circumstances, project subjects are intriguing enough to be done remotely. On the spot or not, PRACE Summer of HPC remains as an excellent opportunity to learn plenty of new stuff during the internship. Into the bargain, I’ll be given a fancy t-shirt 🙂
  10. Right now I’m even more excited of participating. Working remotely in a team will be part of any job in the future, and getting an experience on this kind of situation maybe very useful. Plus I can still achieve the goals I set for myself. I can see only positive aspects in this kind of solution. Thanks for still giving us an opportunity and for all your work.
  11. Considering the past events, I was concerned that SoHPC might have been cancelled for this year, therefore I am happy and hopeful as i still have a chance of being selected for the program! I am now comfortably at home and I have both the chance and the high motivation to complete a project remotely if selected. I believe that recent events will not condition my ability to perform high quality work if selected.
  12. Although I regret not being able to live the full SoHPC experience, I think that it is still an excellent opportunity to work with HPC experts of the best centres around Europe. Moreover, working in groups could make the experience more challenging and enriching, given that selected students should cooperate with people of other countries and with different academic background. Last but not least, dealing with cooperating seems an excellent training for the future, both for academic and professional careers.
  13. In tough times we take exceptional measures.
  14. Despite the exceptional circumstances, project subjects are intriguing enough to be done remotely. On the spot or not, PRACE Summer of HPC remains as an excellent opportunity to learn plenty of new stuff during the internship. Into the bargain, I’ll be given a fancy t-shirt :)I would still be delighted to participate in the SoHPC programme – to get the fantastic opportunity to work on one of projects. The remote structure sounds very good – I think it would be great to work in a team with other SoHPC participants!
  15. I’d be happy to participate in the project under any circumstances.
  16. Participation in the programme is still a valuable experience, doing it remotely is not optimal, but given the circumstances I think is for the best. I will try to make the most of what it is and go through it with the best of my abilities.
  17. All of us had to give up their normal lives because of this disease that is everywhere in the world.We are currently successfully distance learning from our schools. I believe that it will also be successful in remote works.
  18. Many events and flights are canceled and schools are closed due to pandemic. In this chaos, It is unexpectable a big event like SoHPC to be taken place. However, instead of cancellation of this event, execution of trainings remotely will still be a great opportunity for us because what SoHPC offers does not change. We will still be working on a project with the finest HPC centers. Of-course, it is a little disappointing that not meeting people in person and not having a cup of coffee with them to discuss ideas. However, I believe, everyone will do their best to work on their project and also keep alive the spirit of SoHPC.
  19. Thank you for making this internship possible, earlier I was thinking that it will not take place. But glad it will take place and happy to be part of it.

Applications are open from 11th of January 2020 to 26th of February 2020. See the Timeline for more details.

PRACE Summer of HPC programme is announcing projects for 2020 for preview and comments by students. Please send questions to coordinators directly by the 11th of January. Clarifications will be posted near the projects in question or in FAQ.

About the Summer of HPC program:

Summer of HPC is a PRACE programme that offers summer placements at HPC centres across Europe. Up to 24 top applicants from across Europe will be selected to participate. Participants will spend two months working on projects related to PRACE technical or industrial work to produce a visualisation or video. The programme will run from June 31th, to August 31th.

For more information, chech out our About page and the FAQ!

Ready to apply? Click here! (Note, not available until January 10th, 2020)

Have some questions not found in the About section or the FAQ? Email us at sohpc16-coord@fz-juelich.de.

Programme coordinator: Dr. Leon Kos, University of Ljubljana

Project reference: 2024

Creating a mesh* – a discretized surface enclosing a given spatial object – is a compute-intensive task central to many scientific fields ranging from physics (Maxwell’s Eq., CFD, MD) to computational biology (PB, MM/PBSA, cryo-EM, Struct. Bio). The marching cubes algorithm (doi: 10.1145/37402.37422) has become a standard technique to compute meshes in an efficient way. Despite its ubiquitous use, a few fundamental limitations have been reported, in particular geometric ambiguities (doi:10.1016/j.cag.2006.07.021). Such problems can be alleviated – and the entire algorithm made significantly less complex – by switching from cubes to tetrahedrons as basic building blocks. Therefore, this project aims to explore basic feasibility and potential limitations of the marching tetrahedrons algorithm specifically targeting GPU  implementations on modern HPC platforms. A large variety of individual activities are planned (according to skill level and interest of the trainee) starting with simple geometric considerations and including all essential steps of the development cycle characteristic of contemporary scientific software development.

Apart from the promising outlook, i.e. having worked and gained experience with an important subject of very broad applicability, there is a large body of reference work that can help to jump-start the project, e.g. plenty of literature, a public domain implementation (http://paulbourke.net/geometry/polygonise/), a closely related GPU sample in the CUDA SDK (https://docs.nvidia.com/cuda/cuda-samples/index.html#marching-cubes-isosurfaces) and a number of alternative surface computation programs for comparison in terms of performance as well as robustness.

*Note the spelling difference to ‘mess’.

Mesh creation for ubiquitin (a small protein, pdb code 1UBQ) can be achieved from applying the marching cubes/tetrahedrons algorithm resulting in a closed surface discretized into triangular elements (white envelope in panel to the right).

Project Mentor: Siegfried Hoefinger

Project Co-mentor: Markus Hickel and Balazs Lengyel

Site Co-ordinator: Claudia Blaas-Schenner

Participants: Jake Love, Federico Julian Camerota Verdù, Aarushi Jain

Learning Outcomes:
Familiarity with basic development strategies in HPC environments. A deepened understanding of basic scientific key algorithms and how to obtain efficient implementations.

Student Prerequisites (compulsory):
Just a positive attitude towards HPC for scientific applications and the readiness for critical and analytical thinking.

Student Prerequisites (desirable):
Familiarity with Linux, basic programming skills in C/Fortran, experience with GPUs, basic understanding of formal methods and their translation into scientific applications;

Training Materials:
Public domain materials mentioned in the Abstract and first steps with CUDA (https://tinyurl.com/cuda4dummies)

Workplan:

  • Week 1: Basic HPC training; accustom to local HPC system;
  • Week 2: Literature research and first exploratory runs;
  • Week 3: Set up a working plan;
  • Weeks 4-7: Actual implementation and performance evaluation;
  • Week 8: Write up a final report and submit it;

Final Product Description:
The ideal – yet very unrealistic – outcome was a fully functional MT program running efficiently on the V100 (at approx 0.01 sec per system composed of  50 k objects), but equally  satisfactory was a returning summer student having gained good experience with practical work in HPC environments.

Adapting the Project: Increasing the Difficulty:
Increasing performance gains in absolute terms as well as relative to existing implementations;

Adapting the Project: Decreasing the Difficulty:
Various optional subtasks can either be dropped or carried out in greater detail

Resources:
Basic access to the local HPC infrastructure (including various GPU architectures) will be granted. Additional resources in the public domain have been listed in the Abstract.

Organisation:
VSC Research Center

Project reference: 2023

Nowadays, most High-Performance Computing (HPC) systems are clusters that are built out of shared-memory nodes. The nodes itself consist of several CPUs that provide a couple of cores where the code doing the actual computational work gets executed. However, with current and future systems we observe a trend towards more cores per CPU together with less memory available per core and also reduced communication bandwidth per core.

A major topic in HPC is to provide programming tools that enable the programmer to write parallel codes capable of scaling to hundreds of thousands, or even millions, of cores. The de-facto standard here is using the Message Passing Interface (MPI).

MPI might be used alone as pure MPI. However, this might not be the best choice on state-of-the-art HPC systems because of the memory overhead involved. It’s the memory overhead that comes from MPI itself which will increase with the number of MPI processes. And even more important, when applying pure MPI memory might be wasted because of replicated data within a CPU or within a node.

A better way is a combination of using MPI for the distributed memory parallelization on the node interconnect together with a shared-memory parallelization inside of the nodes such as OpenMP or the MPI-3.0 shared-memory model which is usually referred to as hybrid programming techniques MPI+X.

In this SoHPC project we will explore different options of optimizing both the memory consumption and the communication time of a prototypical memory-bound scientific code. We will work with a Jacobi solver that is a simple stencil code with halo communication in 2 and 3 dimensions. A prototype of this code in 2 dimensions for pure MPI and MPI+OpenMP already exists. It will be the task of the SoHPC student to add also an MPI+MPI-3.0-shared-memory version and to extend the code to 3 dimensions. Especially when it comes to applying pure MPI, virtual Cartesian topologies provide a way to reduce the amount of halo data that has to be communicated between the MPI processes as well as – at least in principle – also a way to reorder the MPI processes to optimize their placement with respect to the hardware topology. Currently none of the MPI libraries offers such a reordering, but the principle is known and can be applied. Starting from a simple Roofline performance model for our code, careful performance measurements will serve to analyse the strengths and weaknesses of the various approaches.

The picture shows the VSC-4 – the HPC cluster offering 2.7 PFlop/s that will be used in this SoHPC project; photo: © derknopfdruecker.com .

Project Mentor: Claudia Blaas-Schenner

Project Co-mentor: Irene Reichl

Site Co-ordinator: Claudia Blaas-Schenner

Participants: Clément Richefort, Kevin Mato, Sanath Keshav, Federico Sossai

Learning Outcomes:
After working on this project you will be able to designate the key limitations of using a pure MPI programming model on modern HPC clusters and be able to apply several concepts of hybrid programming MPI+X to optimize both memory consumption and communication time of parallel codes. You will also be able to use profiling tools to evaluate the performance of your codes.

Student Prerequisites (compulsory):
We welcome a basic scientific mindset, curiosity, a keen interest in challenging technical innovations and the appreciation of outside-the-box thinking. The student should be able to work on the Linux command line, have a basic knowledge in programming with either C or Fortran, and know at least the basic principles of parallel programming with MPI.

Student Prerequisites (desirable):
Good programming knowledge with either C or Fortran. Experience with programming with MPI.

Training Materials:
The material of our course ‘Introduction to Hybrid Programming in HPC’ http://tiny.cc/MPIX-VSC is a good resource.

Workplan:

  • Week 1: setting up, getting familiar with our ideas about MPI+X
  • Week 2-3: literature study, first runs, workplan (deliverable)
  • Week 4-6: intense coding and performance analysis
  • Week 7-8: producing the video (deliverable), writing the final report (deliverable)

Final Product Description:
During this SoHPC project you will have developed a prototype of a scientific code implementing various forms of hybrid programming MPI+X – this will likely be one or more code files. In addition you will produce a couple of performance measurements of these codes that will most likely form the basis for the final SoHPC-project report.

Adapting the Project: Increasing the Difficulty:
Either by programming and evaluating the same MPI+X options for a prototypical compute-bound code or by extending the original code towards MPI+CUDA to make use of the GPUs.

Adapting the Project: Decreasing the Difficulty:
By omitting the extension to 3 dimensions.

Resources:
No major resources needed. You should bring your own laptop. We will provide access and computing time on VSC-3 (0.60 PFlop/s) and VSC-4 (2.7 PFlop/s), two Intel-based cluster systems that we host and manage, together with all necessary software products and licences.

Organisation:
VSC Research Center

Project reference: 2022

As HPC systems get more heterogeneous in nature, it is sometimes feasible to have a single programming model can take a advantage of it. In some cases it is really beneficial where programmers do not need to spend so much time on running some computation when there is enough HPC architecture resource is available.  Novel programming model, for example, Charm++, AdaptiveMPI/Charm++, XcalableMP, Thrust, Kokkos and OmpSS can take a advantage of the computation in heterogeneous architecture platform. It is therefore to analyze their limitation and make a comparison between them on a HPC architecture.

https://sciencebusiness.net/news/eurohpc-partnership-opens-bidding-would-be-supercomputer-hosts

Project Mentor: Dr. Sebastien Varrette

Project Co-mentor: Dr. Ezhilmathi Krishnasamy

Site Co-ordinator: Prof. Pascal Bouvry

Participants: Rafał Felczyński, Ömer Bora Zeybek

Learning Outcomes:

  • Learning the novel programming model for the HPC parallel architecture.
  • Understanding the limitation between the programming models.
  • Better understanding of heterogeneous architecture.

Student Prerequisites (compulsory)
Programming skills in C and C++

Student Prerequisites (desirable):
Basic knowledge in parallel programming model or parallel computer architecture.

Training Materials:

Workplan:

  • Week 1/: Training week
  • Week 2/:  Literature Review Preliminary Report (Plan writing)
  • Week 3 – 7/: Project Development
  • Week 8/: Final Report write-up

Final Product Description:
Benchmarking between the selected list of programming models.

Adapting the Project: Increasing the Difficulty:
Including the low level programming might increase the difficulty in the project, for example, MPI, OpenMP and CUDA programming models and compare against novel HPC programming model.

Adapting the Project: Decreasing the Difficulty:
Benchmarking with fewer test cases.
Comparing the results between just 2 or 3 novel programming models.

Resources:
All of those listed programming models are open source.
Student will get a desktop computer and HPC account at Iris supercomputer of the University of Luxembourg.

Organisation:
University of Luxembourg

Project reference: 2021

CFD is solving time dependent partial differential equations (PDE) and PDE is solved by numerical approximation. Which brings the system of equations (matrices and vectors) that need to be solved either by direct or indirect solvers. These solvers need a huge computational power in terms of the chosen problem. Pre-processing and post-processing (time step solutions) also requires lot of computational power to visualize or to make a movie.

Submarines are typically optimized for the high performance and noise reduction. DARPA Suboff is a basic model for submarine with experimental data available.  DARPA Suboff has a different configuration with parts and angle of attack. Use of the CFD in here is to bench marking the existing model and later this will be used to design new model with optimized performance of submarine.

ARPA Suboff (geometry)
DARPA Subof (surface mesh)

Project Mentor: Dr. Ezhilmathi Krishnasamy

Project Co-mentor: Dr. Sebastien Varrette

Site Co-ordinator: Prof. Pascal Bouvry

Participants: Shiva Dinesh Chamarthy, Matt Asker

Learning Outcomes:

  • Computational fluid dynamics using either open source or commercial tools to solve the given problem on a HPC setting.
  • Pre-processing, computation and post processing techniques (data analysis).
  • Parallel visualization
  • Optimization (load balancing on HPC setting and design parameters for DARPA Suboff)

Student Prerequisites (compulsory)
Fluid mechanics and basic programming skills.

Student Prerequisites (desirable):
Familier with any of the open source or commercial CFD software and computational mathematics.

Training Materials:

Workplan:

  • Week 1: HPC training
  • Week 2: Project preparation
  • Week 3: Preprocessin
  • Week 4: Simulation
  • Week 5: Simulation
  • Week 6: Post processing
  • Week 7: Results analysis
  • Week 8: Report writing

Final Product Description:
Compare the simulation results against any one of the model with its wind tunnel experimental results.

Adapting the Project: Increasing the Difficulty:
Including more parameters would make it difficult. For example, considering different angle of attack with different configuration of DARPA Suboff model.

Adapting the Project: Decreasing the Difficulty:
If we do not need to compare the results with experimental results that could make the project just easier.
Considering just few parameters with simulation makes it even simpler

Resources:
ANSYS and OpenFOAM are available. Other open source tools (both for simulation and pre-processing) can be installed upon student request.
Paraview and VisIt (data processing and visualization) are also available.

Organisation:
University of Luxembourg

Project reference: 2020

Over the past decades a vast number of algorithms have been proposed to solve problems in combinatorial optimization either approximately or up to optimality. But despite the availability of high-performance infrastructure in recent years only a small number of these algorithms have been considered from the standpoint of parallel computation. We will review the most important aspects of the branch and bound algorithm and develop a method for finding exact solutions of NP-hard combinatorial optimization problem and produce efficient implementation using high performance computing. We will use parallel branch and bound algorithm that efficiently approximates original problem using semidefinite relaxations. Sequential algorithms search dynamically built enumeration tree one node at the time, whereas parallel algorithms can independently evaluate multiple nodes. The aim of the project is to develop and implement different distributed parallel schemes of Branch and bound algorithm using message-passing interface (MPI) for multi-node parallelization and considerably reduce the running time of the exact solver.

Project Mentor: MSc. Timotej Hrga

Project Co-mentor: Prof.  Janez Povh, PhD

Site Co-ordinator: Dr. Pavel Tomšič

Participants: İrem Naz Çoçan, Carlos Alejandro Munar Raimundo

Learning Outcomes:
The student will learn and improve their skills to analyse and design the scalability of parallel programming.

The student will learn how to implement parallel branch and bound method using different schemes.

Student Prerequisites (compulsory):
Programming skills, C/C++, knowledge in OOP, parallel algorithm design

Student Prerequisites (desirable):
Knowledge in combinatorial optimization and semidefinite programming. Familiar with Linux and HPC.

Training Materials:

OpenMPI documentation:
https://www.open-mpi.org/doc
https://computing.llnl.gov/tutorials/mpi

Semidefinite programming:
https://neos-guide.org/content/semidefinite-programming

Branch and Bound:
https://optimization.mccormick.northwestern.edu/index.php/Branch_and_bound_(BB)

Workplan:

  • W1: Introductory week
  • W2-7: Implementing the parallel schemes in the code (creating, running cases, analysing the results).
  • W8: Final report and video recording.

Final Product Description:

  • Implementation of different parallel approaches to distributed branch and bound algorithm
  • Reduce the running time of sequential branch and bound algorithms.
  • Analysing the results.

Adapting the Project: Increasing the Difficulty:
We can increase the size of data instances or add additional parallel scheme to be implemented.

Adapting the Project: Decreasing the Difficulty:
We can decrease the size of data instances.

Resources:
HPC cluster at University of Ljubljana, Faculty of Mechanical Engineering, and other available HPCs.

Organisation:
University of Ljubljana
project_1620_logo-uni-lj

Project reference: 2019

Due to plasma complexity, different mathematical models depending on the different physical aspects are used for describing plasma. The most complete and used plasma description is the kinetic model, which describes the motion of each particle in plasma. From mathematical point of view this requires solving the relativistic Vlasov-Boltzmann equations with the Maxwell equations for electromagnetic fields. Currently, as a tool the Particle-in-cell (PIC) codes are used for magnetic modelling. Particle-in-cell simulation is a well-established technique which has spawned dozens of codes around the world, (e.g. BIT1, VPIC, VSIM, OSIRIS, REMP, EPOCH, SMILEI, FBPIC, GENE, WARP, PEPC), in which some simulate up to 1012 particles on 106 HPC cores.

In this project we will use a basic parallelized PIC code. The aim is to provide a programm that will read all the output files, join them and create profile graphs. The profile graphs gives information for the simulated plasma, such as: ion/electron temperature, density, velocity and in some cases neutrals.

Furthermore a runtime task scheduler (StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architecture, doi:10.1002/cpe.1631) will be investigated on how to improve the simple PIC to run on GPUs.

Task based parallelisation approach offers heterogeneous parallel computing. By providing codelets, (i.e., performing arithmetic operation, such as vector  vector multiplication, on GPU) in to the simple PIC code and apply where possible.

The StarPU runtime system and PIC codes algorithm

Project Mentor: MSc. Ivona Vasileska

Project Co-mentor: Prof. Dr.Leon Kos

Site Co-ordinator:Dr. Pavel Tomšič

Participants: Paddy CahalaneShyam Mohan Subbiah Pillai, Víctor González Tabernero

Learning Outcomes:
The student will learn and improve their skills in general programming and visualization as well as getting a glimpse in parallel computing.

The student will learn how to implement this method in plasma kinetic codes.

Student Prerequisites (compulsory):
Programming skills, knowledge in OOP,  plasma physics and kinetic plasma theory.

Student Prerequisites (desirable):
Advanced knowledge in python and plasma physics. Familiar with Linux and HPC. Basic knowledge in kinetic plasma modelling.

Training Materials:
StarPU documentation:
http://starpu.gforge.inria.fr/
http://starpu.gforge.inria.fr/doc/html/BasicExamples.html

OpenMPI documentation:
https://www.open-mpi.org/doc
https://computing.llnl.gov/tutorials/mpi 

PIC codes:
https://ptsg.egr.msu.edu/

Matplotlib:
https://www.tutorialspoint.com/matplotlib/index.htm

PyQt5:
https://build-system.fman.io/pyqt5-tutorial

Workplan:

  • W1: Introduction and training week;
  • W2: Learning the kinetic plasma codes (creating, running cases. analysing the results)
  • W3-7: Implementing a visualization program for the code

Final Product Description:
The final results of this project are:

  • Providing a visualization tool for PIC code
  • Improving the speed of the provided simple PIC if possible;
  • Analysing the simulations results.

Adapting the Project: Increasing the Difficulty:
If the outcome is more than successful, prepare a detailed documentation on integrating StarPU into existing phyisics code.

Adapting the Project: Decreasing the Difficulty:
Focus only on providing a tool for visualization of the simple PIC code.

Resources:
HPC cluster at University of Ljubljana, Faculty of Mechanical Engineering, and other available HPCs and HPC Marconi.

Organisation:
University of Ljubljana
project_1620_logo-uni-lj

Project reference: 2018

The project goal is creating a monitoring system to capture real-time information regarding the number of jobs running on HPC clusters, while integrating the work with DevOps and Continuous Integration practices and tools. The information regarding the status of the job queues will be collected, processed and stored as time series and summarized and made available to users and administrators in the form of graphs.

The system will specifically monitor job queues for the HPC scheduler SLURM, and will be build around Prometheus for data collection and time series storage, and Grafana for visualization. It will require the  development of a Prometheus exporter using Go as preferred language choice, although Python is also an option.

The code will be hosted at SURFsara’s GitLab server and it will include automatic tests and code for tools such as Ansible for automatic deployment and configuration. It will be encouraged the use of Continuous Integration practices and tools to automate the development, testing and release processes of the code.

The distribution of the resulting code will depend on the language of choice: statically compiled binary for Go or PIP package for Python. The deployment using containers such as Docker or Singularity is also contemplated. These deliverable items will be automatically generated by CI pipelines.

The student will be assisted in the installation and configuration of the Prometheus and Grafana services, the integration in the SURFsara’s CI tools and services and the deployment and CI setup of the code written as part of the project.

Basic project goals, meme format.

Project Mentor: Juan Luis Font Calvo

Project Co-mentor: Nikolaos Parasyris

Site Co-ordinator: Carlos Teijeiro Barjas

Participants: Cathal Corbett, Joemah Magenya

Learning Outcomes:
The student will learn about the development of monitoring software for HPC environments, supporting it with CI/CD techniques and tools.

By the end of project, the student will have been involved in all the parts of the monitoring process, from the acquisition of the metrics, transmission, storage and visualization.

Student Prerequisites (compulsory):

  • Programming and software development notions
  • Familiarity with GNU/Linux systems and CLI
  • Basic Shell/bash scripting

Student Prerequisites (desirable):

  • Background in engineering or computer sciences
  • Knowledge of Python or Go programming languages
  • Monitoring, preferable Prometheus and/or Grafana
  • Familiar with VCS tools: git, GitLab, GitHub, …
  • Familiar with Continuous Integration concepts and tools

Training Materials:

Workplan:

  • Weeks 1-2: training, getting accounts and setting up development environment, analysis of project requirement
  • Week 3-7: Development of Prometheus exporter, tests and CI pipeline. Configuration of an associated Grafana dashboard
  • Week 8: project wrap up, documentation and report writing and submission

Final Product Description:
The expected results are the development of a monitoring  a monitoring system (Prometheus + Grafana) for HPC job schedulers.

All the software written during the project will be supported by CI/CD techniques and tools (automatic testing, pipelines, automatic deployment, …).

Adapting the Project: Increasing the Difficulty:
The difficulty of the project could be increased by applying more sophisticated statistical analysis to identify trends in the job queues, as well as including support for other HPC schedulers such as TORQUE

Adapting the Project: Decreasing the Difficulty:
The difficulty can be decreased by reducing the complexity of the CI configuration (simpler pipelines and automatic testing), as well as opting for a programming language with which the student is more familiar and proficient.

Resources:

  • computer for development running GNU/Linux or MacOS
  • account on SURFsara GitLab server
  • access to SURFsara HPC/HTC clusters (DDP team)

The above requirements can be provided by the Internal Services department (laptop and user account) and the DDP team (access to HPC/HTC clusters)

Organisation:
SURFsara B.V.

Project reference: 2017

In the race for exascale, supercomputer architectures evolve fast and the variety of competitive hardware solutions has made benchmarking an increasingly important and difficult task for HPC specialists. Today, there is a growing interest from supercomputing centres in easy to use portable frameworks automating the labour intensive tasks of compiling, testing and benchmarking scientific applications.

In order to support the Dutch research community, by providing well tested and optimised applications on our high performance computing infrastructure, here at SURFsara we are using automated workflows for the deployment of the full software stack on our HPC systems.

These pipelines make use of:

  • Easybuild and Jenkins for software building, installation and continuous integration.
  • XALT for tracking and monitoring of software and resources usage.
  • Reframe for regression testing and performance analysis.

This project aims at improving the services HPC centres offer to the European computational researchers, providing support to efficiently run their simulations on modern computing architectures and helping them in making motivated choices to adapt to the fast evolution of the HPC systems and hardware.

Using the information gathered with XALT on SURFsara systems, we will identify relevant HPC codes and their main usage (underlying libraries, execution patterns, etc.). This will allow us to select of a meaningful set of applications, which we will then integrate in the automated regression and performance testing framework (Reframe) for the production of detailed benchmarks and performance profiles on different HPC systems and architectures. The outcome of this work will be essential to better understand the performances of the most relevant HPC software using state of the art performance measurement frameworks, and to write recommendations for efficient deployment and usage of the codes on the HPC systems.

For this work the student will have access to SURFsara’s HPC systems (Cartesius supercomputer and Lisa cluster) where an instance of XALT is already deployed. In addition the student will deploy and benchmark the selected applications on different systems to extend the workflows and compare performances across different systems. Depending on availability, these may include AMD and ARM test systems, as well as systems of other European HPC sites with which SURFsara has established collaborations. SURFsara is indeed involved in several European projects where benchmarking, co-design, and performance tuning play an important role. Depending on the interests of the intern, the selected applications, and the needs of the stakeholders within the different initiatives, the work of the intern can be linked to one or more of the following projects:

  • CompBioMed – Support the Computational Biomedicine community and its diverse set of applications, users and usage scenarios in its High Performance Computing (HPC), High Throughput Computing (HTC), and High-Performance Data Analytics (HPDA) needs.
  • EPI – designing low-power European processors for extreme scale computing, high-performance Big-Data and emerging applications. SURFsara is involved in the co-design task.
  • PRACE Preparatory Access project 5047 with applied reasearch institute Deltares – enabling faster computations on supercomputers of a state of the art hydrodynamics modelling suite.

The intern will be fully integrated in the Supercomputing team at SURFsara, and the produced results and outcomes will be used directly to improve the supercomputer’s ecosystem and our users experience. If the intern shows particular interest and skills, part of his/her work could be devoted to more in-depth profiling and tuning in the context of one of the above-mentioned projects.

Automated workflows used for the deployment of the software stack on SURFsara’s HPC systems.

Project Mentor: Maxime Mogé

Project Co-mentor: Sagar Dolas

Site Co-ordinator: Carlos Teijeiro Barjas

Participants: Elman Hamdi, Jesús Molina Rodríguez de Vera

Learning Outcomes:
In addition to discovering what working at a supercomputing centre looks like, the intern will learn about the main characteristics of current and emerging HPC architectures, the usage of modern automation tools for software building, testing, continuous integration and benchmarking. He/she will get hands on experience with porting, profiling and tuning scientific software, with the outcome of his/her work being directly applicable for the benefit of SURFsara’s users.

Student Prerequisites (compulsory):

  • Basic Unix commands.
  • Knowledge of Linux as development environment.

Student Prerequisites (desirable):

  • Basic knowledge of python.
  • Basic experience with software compilation.
  • Experience with regression testing and benchmarking.
  • Knowledge of computer and HPC systems architecture

Training Materials:

Main training material:

Additional training material (technical):

Workplan:

  • week 1: SoHPC training week
  • week 2: Get familiar with the hardware available at SURFsara and with the regression and testing framework
  • week 3: Select applications and test cases, analyse characteristics of the different available architectures.
  • week 4-5: Integrate application(s) in the framework.
  • week 6-7: Benchmark and analyse performances.
  • week 8: Write recommendations for efficient usage of the application(s) and final report.

Final Product Description:
The project will result in the integration of the selected application in the automated regression and performance testing framework, and will produce detailed benchmarks and performance profiles on different HPC systems and architectures. Through the project we will develop a better understanding of the performances of the selected application and write recommendations for efficient deployment and usage on HPC systems.

If the selected application is part of CompbioMed and if the project is successful, the results could be presented at CompBioMed organised events. If the selected application is Deltares’ D-Flow FM software, the results could be presented at a project meeting of the PRACE Preparatory Access project.

Adapting the Project: Increasing the Difficulty:

  • Do more in-depth performance analysis of the selected applications.
  • Port and benchmark the applications on different architectures (depending on availability).
  • Write guidelines for efficient porting and usage on HPC systems.
  • Select more applications.

Adapting the Project: Decreasing the Difficulty:
The goal of the project could be restricted to integrating a single application in the automated testing workflow..

Resources:
HPC systems and test systems: SURFsara will grant access its HPC systems Lisa and Cartesius and to its test systems (AMD processors, ARM processors, etc.).
Other European HPC systems: if relevant, access will be granted via the above-mentioned projects (CompBioMed, EPI, PRACE 6IP WP7 PA5047)
Software: the tools and frameworks used in this project are open source. For target applications that are not open source, SURFsara’s licenses will be used. Access to applications with special access rules (e.g. Deltares’ software D-Flow FM if it is chosen as a target application) will be arranged through the relevant projects (PRACE 6IP WP7 PA5047 in this case).

Organisation:
SURFsara B.V.

Project reference: 2016

DPD is a stochastic particle method for mesoscale simulations of complex fluids. It is based on a coarse approximation of the molecular structure of soft materials, with beads that can represent large agglomerates or unions of simpler molecules like water. This approach allows to avoid extreme small time and length scales when compared to classical Molecular Dynamic solvers, but retaining the intrinsic discrete nature of matter. However, realistic applications often require very large number of beads to correctly simulate the physics involved. Here the need of scaling to very large systems using latest hybrid CPU-GPU architectures.

The focus of this work will be on benchmarking and optimization on novel supercomputers an existing multi-GPUs version of the DL_MESO (DPD) code; a discrete particle solver for mesoscale simulation of complex fluids.  Currently it has been tested up to 4096 GPUs and needs further development for good scaling on larger systems as well as for improving its performance per single GPU. The code has been tested only on medium complex cases, like flow between surfaces, vapour liquid systems and water droplet phenomena.

The student will have a minimum task of benchmarking the current version, modify the current limiting factors for scaling on large supercomputers and run performance analysis to identify possible bottleneck and relative solutions for speedup. According to s/he experiences, further improvements on the HPC side as well as new features for complex physics could be added. In particular, s/he will focus on porting the constant pressure barostat options into the CUDA version for multi-GPU of DL_MESO.

Project Mentor: Jony Castagna

Project Co-mentor: Vassil Alexandrov

Site Co-ordinator: Luke Mason

Participants: Davide Crisante, Nursima ÇELİK

Learning Outcomes:
The student will learn to benchmark, profile and modify multi-GPUs code mainly written in Fortran and CUDA languages following typical domain decomposition implemented using MPI libraries. S/he will also gain a basic understanding of the DPD methodology and its impact on mesoscale simulations. The student will also gain a familiarity with proper software development procedure using Software for Version Control, IDE and tools for Parallel Profiling on GPUs.

Student Prerequisites (compulsory):
A good knowledge of Fortran, MPI and CUDA programming is required as well as in parallel programming for distributed memory.

Student Prerequisites (desirable):
Some skills in being able to develop mixed code such Fortran/CUDA will be an advantage as well as experience in multi-GPU programming using CUDA/MPI

Training Materials:
These can be tailored to the student once he/she is selected.

Workplan:

  • Week 1/: Training week
  • Week 2/:  Literature Review Preliminary Report (Plan writing)
  • Week 3 – 7/: Project Development
  • Week8/: Final Report write-up

Final Product Description:
The final product will be an internal report, convertible to a conference or better journal paper, improved version of the code with benchmark and comparison of the improved version of the DL_MESO multi-GPUs.

Adapting the Project: Increasing the Difficulty:
The project is on the appropriate cognitive level, taking into account the timeframe and the need to submit final working product and 1 reports

Adapting the Project: Decreasing the Difficulty:
The topic will be researched and the final product will be designed in full but some of the features may not be developed to ensure working product with some limited features at the end of the project

Resources:
The student will need access to a multi GPUs machines, standard computing resources (laptop, internet connection).

Organisation:
Hartee Centre – STFC

Image result for Hartree Centre - STFC

Project reference: 2015

The broad userbase of clusters is not familiar with the ins-and-outs of the systems they are working on and such familiarity is not really necessary.

As a result many will enqueue jobs with maximum available run time
rather than trying to provide good estimates on the execution time of their jobs. In the best case this leads to the job’s priority being low and the starting time for the job permanently moving into the future.

In the worst case this may lead to idle resources due to the scheduler preventing the launch of shorter jobs due to priority constraints.

Project Objectives: Use simple machine learning frameworks (e.g. TensorFlow) to estimate the execution time of a user’s jobs based on:

  1. The user’s history
  2. Job name
  3. Job being part of an array.

At first it will be sufficient to try and provide tighter bounds on the execution time of array jobs. On the next step one would try to update the run-time bounds for the enqueued remainder of an array job. Finally, should the above prove fruitful, the approach will be extended to all of the user’s jobs and an additional flag to SLURM or a wrapper script should be implemented to allow a user to opt-in to such automatic run-time updates. The approach will be tested on real life problems and HPC loads.

Project Mentor: Vassil Alexandrov

Project Co-mentor: Anton Lebedev

Site Co-ordinator: Luke Mason

Participants: Francesca Schiavello, Ömer Faruk Karadaş

Learning Outcomes:
The student will learn first a variety of advanced ML approaches and how to apply these in conjunction with workload managers such as SLURM. The developed programs will be applied to synthetic and real HPC workloads for clusters of varying size.

Student Prerequisites (compulsory):
Good knowledge of Python.

Student Prerequisites (desirable):
TensorFlow or other ML frameworks with Python interfaces.
Basics of SLURM usage.

Training Materials:
These can be tailored to the student once he/she is selected, but the SLURM user’s manual will be an integral part of the materials.

Workplan:

  • Week 1/: Training week
  • Week 2/:  Literature Review Preliminary Report (Plan writing)
  • Week 3 – 7/: Project Development and Evaluation
  • Week8/: Final Report write-up

Final Product Description:
The final product will be an enhanced SLURM workload manager tested on real life problems and HPC loads.

Adapting the Project: Increasing the Difficulty:
The project is on the appropriate cognitive level, taking into account the timeframe and the need to submit final working product and 2 reports.
Provided sufficient progress, the students input for the development objectives may be included.

Adapting the Project: Decreasing the Difficulty:
The topic will be researched and the final product will be designed in full but some of the features may not be developed to ensure working product with some limited features at the end of the project.

Resources:
The student will need access to relevant HPC machine to test the approach which can be GPU and/or  a multicore based one, standard computing resources (laptop, internet connection).

Organisation:
Hartee Centre – STFC

Image result for Hartree Centre - STFC

Project reference: 2014

The deMon2k (density of Montréal) [1] is a software package for density functional theory (DFT) calculations. It uses the linear combination of Gaussian-type orbital (LCGTO) approach for the self-consistent solution of the Kohn-Sham (KS) DFT equations.  RT-TDDFT (Real Time – Time Dependent Density Functional) has been implemented by the Aurélien de la Lande’s group at LCP Orsay. It is based on Magnus propagation and involves an exponentiation of a general complex matrix without any special properties. In order to evaluate it, three different ways have been implemented in the current source code by Aurélien de la Lande’s group at LCP Orsay, that means the diagonalization, the Taylor expansion and the Baker-Campbell-Hausdorff scheme. Each of these methods implies linear algebra operations which spend most of the CPU resources during RT-TDDFT calculations and can be extremely time consuming for large systems. In order to assess the Magnus propagation in a parallel programming context, a strategy based on using the ScaLAPACK/MPI [2] library has been implemented for the Taylor expansion. Till now, this work has been done only for CPU architecture. The aim of this internship will be to write a basic prototype which implements the matrix exponentiation for GPU architecture with one of the 3 methods cited before. In order to do so, the main strategy will be to use the MAGMA library [3] (LAPACK implementation for GPU). During this internship, the students will work at “Maison de la Simulation” [4] which is one of the most important HPC institutions in France. They will be able learn some basic programming in CUDA and how to do parallel linear algebra operations within the MAGMA library. The students will get an access to the Jean Zay supercomputer located at Idris [5] and has more than 1000 GPUs NVIDIA Tesla v100. Jean Zay supercomputer is ont of the most powerful machine in the world, and it is ranked 46th at the top 500 in November 2019.

References:
[1] http://www.demon-software.com/public_html/index.html
[2] https://netlib.sandia.gov/scalapack/
[3] http://icl.cs.utk.edu/magma/
[4] http://www.maisondelasimulation.fr/
[5] http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html

The Jean Zay supercomputer located at Idris will be used during the internship. It has more than 1000 GPUs NVIDIA Tesla v100 and it is ranked 46th at the top 500 in November 2019. The picture can be used for free but the author must be mentioned as the following: C.Frésillon/ IDRIS /CNRS Photothèque

Project Mentor: Karim Hasnaoui

Project Co-mentor: Aurélien de la Lande

Site Co-ordinator: Karim Hasnaoui

Participants: Pablo Antonio Martínez Sánchez, Theresa Vock

Learning Outcomes:
The students will learn how to do parallel linear algebra on GPU through the MAGMA libraries. The student will also learn some basics in CUDA programming and how to write interfaces between Fortran and C languages. The student will also learn how to become familiar with a supercomputer environment (Linux system,  job scheduler, etc…).
 http://icl.cs.utk.edu/magma/

Student Prerequisites (compulsory)

  • Basic knowledge in linear algebra
  • Basic knowledge in Fortran or C language

Student Prerequisites (desirable):

  • Basic knowledge in Linux environment
  • Knowledge in basic editors (vi or emacs)

Training Materials:
Basic Linux (see Section 2):
https://www.tecmint.com/free-online-linux-learning-guide-for-beginners/

Vim editor tutorial :
https://www.tutorialspoint.com/vim/index.htm

The Magma library :
http://icl.cs.utk.edu/magma/

Accommodation:
http://www.u-psud.fr/en/campus-life/accommodation.html?search-keywords=housing
For accommodation, the student can directly contact me and I’ll help them to find a place for their stay.

Workplan:
Timeline for the completion of the project:

  • Week 1 & 2: Learn how to use the magma library, basic knowledge on Linux and how to use the job scheduler at Jean Zay.
  • Week 3: write the prototype of matrix exponentiation with the MAGMA library and quick benchmarking
  • Week 4 & 5: Learn how to write interfaces between Fortran and C language
  • Week 6: Rewrite the prototype in Fortran by using interfaces
  • Week 7: Final prototype benchmarks
  • Week 8: Final report

Final Product Description:
The aim of the project is to write a prototype how to calculate the matrix exponentiation on GPU architecture and to benchmark it. If it will be concluding, the prototype will be included in the deMon2k code.

Adapting the Project: Increasing the Difficulty:
/

Adapting the Project: Decreasing the Difficulty:
/

Resources:
The students will get an access to the Jean Zay machine which is the new supercomputer converged platform acquired by the French Ministry of Higher Education, Research and Innovation through the intermediary of the French civil company, GENCI. Jean Zay is an HPE SGI 8600 computer which consists of two partitions. The first partition is composed of scalar nodes and the second partition of “converged” nodes, or more precisely, “converged accelerated hybrid nodes”. These hybrid nodes are equipped with both CPUs and GPUs which permit the usages associated with both HPC and AI. The second partition contains more than 1000 GPUs. In November 2019, Jean Zay machine is ranked 46th to the top 500.

http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html

Organisation:
Maison de la Simulation (CEA/CNRS)

Project reference: 2013

As scalable hardware for quantum computers becomes more of a reality, the development of a software stack and working applications becomes increasingly important; both to investigate what type of problems are well suited to quantum architectures and so that there exists production ready software. There are a wide range of applications that are being developed as proof of concepts for quantum computing. These are related to a variety of domains, from natural language processing to chemistry. Due to the restricted availability of functioning and operable quantum computers, a commonly used tool for development is quantum simulators. Quantum simulators simulate the underlying mathematics of quantum computers on a gate-based level. They can be run on classical computing systems including both CPU’s and GPU’s. However, the depth of the gate-based operations along with the large memory overhead of representing quantum states using quantum simulators means that it is impossible to run larger scale experiments on a standard desktop or laptop. Thus, high performance computing (HPC) systems are essential for larger scale applications.

String comparison has been shown to be a problem that maps well to quantum architectures. Each quantum state is represented by a binary string in its register. An arbitrary non-binary string can then be encoded into this binary string. Thus, we can represent each non-binary string as a unique quantum state. There are a number of methods which can then be used to explore the similarity of quantum states to a test state.

In this project, we will develop an application which exploits the reversibility of quantum circuits to quantify the similarity between a set of patterns and a test pattern. The final application will take a list of genome sequences and determine their similarity to a test genome sequence. Rigetti’s Forest SDK will be used to implement the solution. The Forest SDK components that will be used includes a Python library for quantum circuit development (PyQuil), and Rigetti’s Quantum Virtual Machine (QVM) which is a quantum simulator. Upon completion of the solution, the application’s performance at different scales will be examined.

A Bloch sphere illustrating a qubit generated using the QuTiP library in Python.

Project Mentor: Myles Doyle

Project Co-mentor: Venkatesh Kannan

Site Co-ordinator: Simon Wong

Participants: Benedict Braunsfeld, Sara Duarri Redondo

Learning Outcomes:

  • Deeper understanding of quantum computing techniques and the underlying principles of quantum computing
  • Experience using the Forest SDK, and other quantum SDK’s.
  • Creating a publicly demonstratable string comparison application that can be executed on either a quantum simulator or quantum processing unit.
  • Use of a HPC system and scaling of application.
  • Exposure to larger projects in which the techniques and results of this project are integral.

Student Prerequisites (compulsory)

  • Experience with Linux/Unix.
  • Programming with Python or C++..

Student Prerequisites (desirable):

  • Basic knowledge of quantum mechanics (specifically related to quantum computing, but any knowledge is an advantage).
  • Experience with any quantum computing frameworks (Forest, Atos QVM, Qiskit, Q#, Intel-QS, etc).

Training Materials:

Workplan:

  • Week 1: Training week.
  • Week 2: Induction, overview of project, introduction to relevant quantum computing concepts, HPC system and Forest SDK.
  • Week 3: Implement standard exercises for quantum computing, then implement Qiskit’s string comparison example using the Forest SDK.
  • Week 4: Design implementation methodology and write project plan.
  • Week 5: Implement an application to conduct the string comparison algorithm for genetic pattern matching of arbitrary genome sequences.
  • Week 6: Conduct scaling experiments on Kay and produce relevant visualisations of results.
  • Week 7: Documentation of application and report writing.
  • Week 8: Report writing and demonstration to ICHEC researchers.

Final Product Description:
A genome pattern matching application which quantifies the similarity of a population of genome sequences to a test sequence using the Forest SDK and executed using a quantum simulator.
Performance scaling results of the application.

Adapting the Project: Increasing the Difficulty:
The student could implement different string comparison algorithms which require more complicated quantum circuits and compare the results of the different methods (possible methods include using a conditional Oracle, quantum associative memory, or using a quantum phone directory approach).
The application could be run using the simulator with noise models enabled. Similarly, a real quantum computer could be used (IBM has free public access to some of their smaller systems).

Adapting the Project: Decreasing the Difficulty:
The student could implement a simpler version of the application which targets genome sequences consisting of binary values instead of multiple values (four nucleotides: ‘A’, ‘C’, ‘G’ and ‘T’).

Resources:

  1. Access to HPC system (ICHEC’s HPC system, Kay, will be made available)
  2. Laptop (to be brought by the student)
  3. Rigetti’s Forest SDK (publicly available and relevant components will be pre-installed on Kay)
  4. Atos QVM (Optional) (publicly available)
  5. Access to Atos QLM (Optional and to be confirmed) (Access provided by ICHEC)
  6. IBM Q Experience (publicly available)

Organisation:
Irish Centre for High-End Computing

Project reference: 2012

Today’s supercomputing hardware provides a tremendous amount of floating point operations (FLOPs). While CPUs are designed to minimize the latency of  a stream of individual operations, GPUs try to maximize the throughput. However, GPU FLOPs can only be harvested easily, if the algorithm does exhibit lots of independent data parallelism. Hierarchical algorithms  like the Fast Multipole Method (FMM) inhibit
the utilization of all available FLOPs on GPUs due to their inherent data dependencies and only little independent data parallelism.

Is it possible to circumvent these problems?

In this project we turn our efforts towards a fully taskified FMM for GPUs. Depending on your scientific background we will pursue different goals. First, the already available GPU tasking framework needs to be coupled with the full FMM “shift” and “translation” operators to enable a larger range of accuracies.  Second, a special reformulation of the mathematical FMM operators can be implemented and tested to increase the efficiency of such hierarchical methods for GPUs even further.

The challenge of both assignments is to execute tiny to medium-size compute kernels without large overheads within the tasking framework. This also ensures portability between different generations/designs of modern GPUs.

What is the fast multipole method? The FMM is a Coulomb solver and allows to compute long-range forces arising in molecular dynamics, plasma or astrophysics. A straightforward approach is limited to small particle numbers N due to the O(N^2) scaling. Fast summation methods such as PME, multigrid or the FMM are capable of reducing the algorithmic complexity to O(N log N) or even O(N).  However, each fast summation method has auxiliary parameters, data structures and memory requirements which need to be provided. The layout and implementation of such algorithms on modern hardware strongly depends on the available features of the underlying architecture.

Inside the brain of the 2020 PRACE SoHPC student after two weeks

Project Mentor: Ivo Kabadshow

Project Co-mentor: Laura Morgenstern

Site Co-ordinator: Ivo Kabadshow

Participants: Josip Bobinac, Igor Abramov

Learning Outcomes:
The student will familiarize himself with current state-of-the art GPUs (Nvidia P100/V100). He/she will learn how the GPU/accelerator functions on a low level and use this knowledge to utilize/extend a tasking framework for GPUs in a modern C++ code-base. He/she will use state-of-the art benchmarking/profiling tools to test and improve performance for the tasking framework and its compute kernels which are time-critical in the application. Special emphasis will be placed on the efficient design of datastructures and access pattern. Different mathematical representations of the “same thing” lead to substantial differences in performance on different hardware.

Student Prerequisites (compulsory):
Prerequisites

  • Programming knowledge for at least 5 years in C++
  • Basic understanding of template metaprogramming
  • “Extra-mile” mentality

Student Prerequisites (desirable):

  • CUDA or general GPU knowledge desirable, but not required
  • C++ template metaprogramming
  • Interest in C++11/14/17 features
  • Interest in low-level performance optimizations
  • Ideally student of computer science, mathematics, but not required
  • Basic knowledge on benchmarking, numerical methods
  • Mild coffee addiction
  • Basic knowledge of git, LaTeX, TikZ

Training Materials:
Just send an email … training material strongly depends on your personal level of knowledge. We can provide early access to the GPU cluster as well as technical reports from former students on the topic. If you feel unsure about the requirements, but do like the project, send an email to the mentor and ask for a small programming exercise.

Workplan:

Week – Work package

  1. Training and introduction to FMMs and GPU hardware
  2. Benchmarking of current FMM operators on the CPU
  3. Adding basic FMM operators to the GPU tasking framework
  4. Extending FMM operators to support multiple compute kernels
  5. Adding reformulated (more compact) FMM operators
  6. Performance tuning of the GPU code
  7. Optimization and benchmarking, documentation

Generating of final performance results. Preparation of plots/figures. Submission of results.

Final Product Description
The final result will be a taskified FMM code with CUDA to support GPUs. The benchmarking results, especially the gain in performance can be easily illustrated in appropriate figures, as is routinely done by PRACE and HPC vendors. Such plots could be used by PRACE.

Adapting the Project: Increasing the Difficulty:
The tasking framework uses different compute kernels. For example it may or may not be required to provide support for a certain FMM operator. A particularly able student may also apply the GPU tasking to multiple compute kernels. Depending on the knowledge level, a larger number of access/storage strategies can be implemented or performance optimization within CUDA can be intensified.

Adapting the Project: Decreasing the Difficulty:
As explained above, a student that finds the task of adapting/optimizing the FMM operators to all compute kernels too challenging, could very well restrict himself to a simpler model or partial set of FMM operators.

Resources:
The student will have his own desk in an air-conditioned open-plan office (12 desks in total) or in a separate office (2-3 desks in total), will get access (and computation time) on the required HPC resources for the project and have his own workplace with a fully equipped workstation for the time of the program. A range of performance and benchmarking tools are available on site and can be used within the project. No further resources are required. Hint: We do have experts on all advanced topics, e.g. C++11/14/17, CUDA in house. Hence, the student will be supported when battling with ‘bleeding-edge’ technology.

Organisation:
Jülich Supercomputing Centre

Project reference: 2011

Simulations of classical or quantum field theories often rely on a lattice discretized version of the underlying theory. For example, simulations of Lattice Quantum Chromodynamics (QCD, the theory of quarks and gluons) are used to study properties of strongly interacting matter and can, e.g., be used to calculate properties of the quark-gluon plasma, a phase of matter that existed a few milliseconds after the Big Bang (at temperatures larger than a trillion degrees Celsius). Such simulations take up a large fraction of the available supercomputing resources worldwide.

Other theories have a lattice structure already “build in”, as is the case for graphene, with its famous honeycomb structure. Simulations studying this material can build on the experience gathered in Lattice QCD. These simulations require, e.g., the repeated computation of solutions of extremely sparse linear systems and update their degrees of freedom using symplectic integrators.

Depending on personal preference, the student can decide to work on graphene or on Lattice QCD. He/she will be involved in tuning and scaling the most critical parts of a specific method, or attempt to optimize for a specific architecture in the algorithm space.

In the former case, the student can select among different target architectures, ranging from Intel XeonPhi (KNL), Intel Xeon (Haswell/Skylake) or GPUs (OpenPOWER), which are available in different installations at the institute. To that end, he/she will benchmark the method and identify the relevant kernels. He/she will analyse the performance of the kernels, identify performance bottlenecks, and develop strategies to solve these – if possible taking similarities between the target architectures (such as SIMD vectors) into account. He/she will optimize the kernels and document the steps taken in the optimization as well as the performance results achieved.

In the latter case, the student will, after getting familiar with the architectures, explore different methods by either implementing them or using those that have already been implemented. He/she will explore how the algorithmic properties match the hardware capabilities. He/she will test the archived total performance, and study bottlenecks e.g. using profiling tools. He/she will then test the method at different scales and document the findings.

In any case, the student is embedded in an extended infrastructure of hardware, computing, and benchmarking experts at the institute.

QCD & HPC

Project Mentor: Dr. Stefan Krieg

Project Co-mentor: Dr. Eric Gregory

Site Co-ordinator: Ivo Kabadshow

Participants: Anssi Tapani Manninen, Aitor López Sánchez

Learning Outcomes:
The student will familiarize himself with important new HPC architectures, such as Intel Xeon, OpenPOWER or other accelerated architectures. He/she will learn how the hardware functions on a low level and use this knowledge to devise optimal software and algorithms. He/she will use state-of-the art benchmarking tools to achieve optimal performance.

Student Prerequisites (compulsory):

  • Programming experience in C/C++

Student Prerequisites (desirable):

  • Knowledge of computer architectures
  • Basic knowledge on numerical methods
  • Basic knowledge on benchmarking
  • Computer science, mathematics, or physics background

Training Materials:

Supercomputers @ JSC
http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/supercomputers_node.html

Architectures
https://developer.nvidia.com/cuda-zone
http://www.openacc.org/content/education

Paper on MG with introduction to LQCD from the mathematician’s point of view: http://arxiv.org/abs/1303.1377

Introductory text for LQCD:
http://arxiv.org/abs/hep-lat/0012005
http://arxiv.org/abs/hep-ph/0205181

Introduction to simulations of graphene:
https://arxiv.org/abs/1403.3620
https://arxiv.org/abs/1511.04918

Workplan:

Week – Work package

  1. Training and introduction
  2. Introduction to architectures
  3. Introductory problems
  4. Introduction to methods
  5. Optimization and benchmarking, documentation
  6. Optimization and benchmarking, documentation
  7. Optimization and benchmarking, documentation

Generation of final performance results. Preparation of plots/figures. Submission of results.

Final Product Description
The end product will be a student educated in the basics of HPC, optimized methods/algortithms or HPC software.

Adapting the Project: Increasing the Difficulty:
The student can choose to work on a more complicated algorithm or aim to optimize a kernel using more low level (“down to the metal”) techniques.

Adapting the Project: Decreasing the Difficulty:
Should a student that finds the task of optimizing a complex kernel too challenging, could restrict himself to simple or toy kernels, in order to have a learning experience. Alternatively, if the student finds a particular method too complex for the time available, a less involved algorithm can be selected.

Resources:
The student will have his own desk in an open-plan office (12 desks in total) or in a separate office (2-3 desks in total), will get access (and computation time) on the required HPC hardware for the project and have his own workplace with fully equipped workstation for the time of the program. A range of performance and benchmarking tools are available on site and can be used within the project. No further resources are required.

Organisation:
Jülich Supercomputing Centre

Follow by Email