Big data management for better electricity consumption prediction

Project reference: 2129
The main objective of this SoHPC project is to test how our existing code for energy consumption prediction scales from a local server to supercomputer.
We have developed Python and R scripts to retrieve data, store it to MongoDB and load it back when needed. Additionally, based on the historical data we have developed scripts that build prediction models.
Using deep neural networks, building each data model takes approx. 2 minutes and approx. 8 MB of memory.
Therefore, the main goal of this project is to test how we can adapt the existing R and Python scripts such that we can build 10.000 models and predictions within time limit of approx. 1 hours using a supercomputer with state-of-the-art computing nodes and storage.

The workflow of the project
Project Mentor: Prof. Janez Povh, PhD
Project Co-mentor: Matic Rogar
Site Co-ordinator: Leon Kos
Participants: Irem Dundar, Omar Patricio Perez Znakar
Learning Outcomes:
- Student will master R and parallelization in R using RStudio for creating computing jobs;
- Student will master Hadoop and RHadoop and MongoDB for big data management;
- Student will master basic analytics methods using RHadoop;
Student Prerequisites (compulsory):
R, Python
Basics from regression and classification
Student Prerequisites (desirable):
Basics from Hadoop.
Basics from data management (NoSQL data bases – MongoDB)
Training Materials:
The candidate should go through PRACE MOOC:
https://www.futurelearn.com/admin/courses/big-data-r-hadoop/7
Workplan:
W1: introductory week;
W2: efficient I/O management of industrial big data files on local HPC
W3-4: studying existing scripts for data management and predictions and parallelizing them;
W5: testing scripts on real data
W6: preparing materials for MOOC entitled MANAGING BIG DATA WITH R AND HADOOP.
W7: final report
W8: wrap up
Final Product Description:
- Developed scripts to retrieved industrial big data files and store in Hadoop;
- Created RHadoop scripts for parallel analysis and computing new prediction models;
Adapting the Project: Increasing the Difficulty:
We can increase the size of data or add more demanding visualization task.
Adapting the Project: Decreasing the Difficulty:
We can decrease the size of data or simply the prediction models.
Resources:
Hadoop, R, Rstudio, MongoDB and RHadoop installations at University of Ljubljana, Faculty of mechanical engineering
Organisation:
UL-University of Ljubljana
Leave a Reply