Can we converge HPC and BIG data?
Hi everyone, my name is Rajani Kumar Pradhan, and I am from India (Andhra Pradesh). Currently, I am pursuing my PhD in the Department of Water Resource and Environmental Modelling, at the Czech University of Life Sciences Prague. My research interest is centred around satellite precipitation, climate change, hydroclimatic variability and the global water cycle.
Before starting my PhD, I have worked as a ‘Junior Research Fellow’ at Banaras Hindu University, India. Even before that, I have completed my master’s in environmental science, from the Central University of Rajasthan, India. Although I have some general idea about High-performance Computing and Big data, however, the real encounter started when I faced some huge datasets. This is the moment when I realized the importance of HPC in data analysis and thanks to my supervisor Yannis Markonis, who introduced and shared the information regarding the PRACE Summer of HPC 2021.
The journey has been started with a one-week training programme from introducing the basics of HPC to some advanced topics. It is unfortunate that we cannot meet or attend in person considering the pandemic, however, the online meeting with zoom also has some unique experience. During the training week, I have introduced to several new terms like MapReduce, OpenMP, MPI and most of them I never heard before. To bring some more fun into the programme they have also scheduled some hands-on exercises, following the lectures.
And coming to our main project, the training programmes offered several excellent diverge projects. Among them, I have selected project number 2133 The convergence of HPC and Big data/HDPA, which I find more interesting and very relevant to my needs. The main motivation behind the project is how to tackle the challenges in the convergence of Big data analysis with HPC.
During the following week, I have started our main projects with Giovana Roda, and college Pedro Hernandez Gelado. We have started with another training week and some basic introduction about Hadoop, HDFS, and Spark. At the same time, we have some exercises to practice on the Vienna Scientific Cluster(VSC-4). Initially, we have started with the Little Big Data (LBD) and later we have worked on the VSC-4. In the upcoming weeks, we will work on some case study datasets on the cluster and cannot be more excited about it! Stay tuned, will come with new updates in the upcoming weeks.
Until that bye..!