Be Ready for Information Overload About My Internship

Hello everyone, it’s Elman, again!!!
If you are read my first blog post you should know I am an intern in SURFsara and this blog I will give you some basic information about HPC systems and what I do in Summer of HPC. Now, fasten the belds and be ready for my information overload about HPC!!!
Firstly, you should know about what is an HPC cluster? In short High Process Computing, but what is that mean? Please look at the photo below.

This an HPC cluster or we prefer to say a supercomputer which is used in SURFsara. It was named Cartesius Supercomputer. Every weekday, I connect to the supercomputer from Turkey and make some tests, play with the system and gather some information on this supercomputer.
HPC clusters or supercomputers has a really good processing ability because they have too much powerful GPUs and CPUs which are connected parallel. This high process capacity using to make some scientific application for example an application ORCA which is about nano chemistry. Scientific use that for gather some information to their research.
If you want to make a test on a supercomputer, you should know some parallelization tools. I use MPI and OpenMP in my internship. They are used to make execute calculation on the parallel system. Otherwise, you can just use a small processing ability of a supercomputer. For example, you have a big function which needs too much calculation. You should use a program to split the function to some small pieces for calculating these pieces in different nodes and after executing the function, the program knows how to merge the all splitting function pieces and give us an output. So the program can be MPI, OpenMP or you can use both of them at the same time.
Now, you know what is HPC clusters, and what we use for parallelizing it. But the main problem is HPC systems are very heterogeneous environments. They have different parallelization approaches, different hardware solutions and there is a wide variety of scientific software offered to the system user. The heterogeneity is not a problem for just HPC maintainers, it is also a big problem for users. Because HPC maintainers should know that everything works fast, smoothly and reliable on their system.
For overcome this issue we use ReFrame framework.
I know that this explanation about ReFrame is not understandable for beginners. So let’s see this scenario:
You have some input file and want to take an output, but you can not know how many nodes and how many tasks for each node you should use. Basically, you don’t know how to configure your test for using the system efficiently and take your outputs rapidly. Formerly, system user can make some configurations for tests, execute the configured tests separately and take this result and find the best performance… It is a time-consuming and very bad way to do it. Because the user will lose a lot of time while making these all configured tests. In SURFsara we use ReFrame framework about that. You can make some these tests in just one code file and execute it then it will give you all results in one file.
My first two assignment in SURFsara is about IOR and MDtest. Mdtest is a metadata benchmark tool, which is an MPI-based application. It can be used to evaluate the metadata performance of a file system.
What is metadata? If a book is a data then metadata is the book’s cover and contents. So basically metadata is a data which is summarize other data.
I can not execute mdtest while the system is working. If I try to execute it then file system might crash. This is because mdtest create new directories and enforce the file system for measure limit of the system.
And my second assignment is IOR((Interleaved or Random). It can be used for testing the performance of parallel file systems using various interfaces and access patterns.
In basic words, You can make some tests about IOR with using and take outputs about the supercomputer write, read and remove performance. Then, you can make a comparison and see when the performance is good. That will give you knowledge about your system performance and you can detach any problem on your supercomputer.
I made the IOR tests on the Cartesian. Cartesian is a supercomputer which is placed in SURFsara. After took output of the IOR tests I am going to wrote some summarize according to the assignment.
Thank you for reading! See you again in my other articles.
Leave a Reply