Industrial Big Data analysis with RHadoop

Project reference: 1923
The project will consist of:
- Getting to know Hadoop and RHadoop;
- Defining big data source related to Industry 4.0;
- Creating and storing big data files (BD);
- Preparing BD for basic analysis;
- Defining predictive model and writing RHadoop code for building this model;
- Evaluation and application of the developed model on new data.
The student will create a big data file, store it in a DFS; perform basic analysis and build a predictive model for new data using RHadoop.
Project Mentor: Prof. Janez Povh, PhD
Project Co-mentor: MSc.Timotej Hrga
Site Co-ordinator: Doc. Dr. Leon Kos
Participant: Khyati Sethia
Learning Outcomes:
Student Prerequisites (compulsory):
- Basics from data management;
- R language
- Basics from regression and classification
Student Prerequisites (desirable):
Basics from Hadoop.
Training Materials:
The candidate should go through PRACE MOOC:
https://www.futurelearn.com/admin/courses/big-data-r-hadoop/4
Workplan:
- W1: introductory week;
- W2: creating and storing industrial big data file
- W3: coding and evaluating scripts for analysis;
- W4-W5: coding and evaluating scripts for analysis;
- W6: preparing materials for MOOC entitled MANAGING BIG DATA WITH R AND HADOOP.
- W7: final report
- W8: wrap up
Final Product Description:
- Retrieved industrial big data files and stored in Hadoop;
- Created RHadoop scripts for analysis and new prediction models;
- Created a report on this example to be used in do PRACE MOOC with title MANAGING BIG DATA WITH R AND HADOOP.
Adapting the Project: Increasing the Difficulty:
We can increase the size of data or add more demanding visualization task.
Adapting the Project: Decreasing the Difficulty:
We can decrease the size of data or simply the prediction model.
Resources:
RHadoop installation at the University of Ljubljana, Faculty of mechanical engineering.
Leave a Reply