Parallelizing Earth Observation Workflow
Project reference: 2117
Geographic information science and remote sensing deal with data that are unique in the sense that they carry spatial information with them. Unlike other data, geospatial data possesses a unique footprint on the surface of the earth and it also has a time dimension to it. Although HPC can be easily implemented in other areas of natural sciences, it is not straightforward in the case of earth observation data because of the complexity attributed to their geospatial information. Also, one snapshot from the satellite instrument orbiting around the earth captures multiple layers of data at several spectral wavelengths which turns satellite data into complex big data. The traditional approach of image manipulation is thus inadequate for these earth observation acquisitions. The need for HPC is very clear due to the very nature of these data sets, but the implementation has several bottlenecks due to the ever-expanding nature of geospatial data. The Parallel Earth project aims to adopt the traditional MPI implementation on the earth observation workflow so that the time- and resource-consuming tasks can be executed in parallel without losing any associated information. For the project, the case study will involve processing satellite images from Sentinel-2 mission to compute Normalized Difference Vegetation Index that will give an indication of the presence of vegetation on the ground or water. The computation workflow will involve Python based processing together with Java based Graph Processing Toolkit present in the SNAP software used to handle imagery. The MPI for Python will be implemented to optimize the workflow. At the end of the project, the optimal, yet scalable approach will be identified where the technique can be replicated for similar processing.Project Mentor: Sita Karki
Project Co-mentor: Manuel Fernandez
Site Co-ordinator: Simon Wong
Students will be able to work with earth observation data and optimize the workflow to deal with big data that are coming from satellite missions. They will get experience on manipulating high-resolution imagery taking advantage of the high-performance computing.
Student Prerequisites (compulsory):
Basic knowledge of Linux, Python Programming, Background in natural or physical science.
Student Prerequisites (desirable):
Familiarity with or interest in geographic information system, remote sensing and natural science.
Familiarity with QGIS and SNAP software.
Experience working with geospatial data, C or Fortran.
General overview of the Earth Observation Application (Week 1 and 2)
Introduction to SNAP Software:
Week 1: Training week
Week 2: Introduction to project case studies and HPC implementation.
Week 3: Submission of work plan, introduction to geographic information system and remote sensing data.
Week 4: Hands on with geospatial data.
Week 5-7: Image processing and HPC implementation.
Week 8: Report preparation, submission, presentation.
Final Product Description:
The expected result will demonstrate the successful adoption of parallelization for computing earth observation data sets. The final result will show the best approach for basic case study.
Adapting the Project: Increasing the Difficulty:
The project will involve the basic workflow involving earth observation data sets and the workflow can easily be replaced with multiple data inputs with varied resolution to increase the complexity level.
Adapting the Project: Decreasing the Difficulty:
The basic workflow will be replaced with simple image manipulation with single input and output to decrease the difficulty level. Also, the resolution and multi-band approach associated with multi-spectral imagery can be dropped in the case of workflow simplification.
Only open-source software will be used in the project and any supplementary material or training activities can be completed on a personal computer.
ICHEC-Irish Centre for High-End Computing