Big Data clustering with RHadoop

Project reference: 1822
The project will consist of:
- Getting to know Hadoop and RHadoop;
- Creating and storing big data files (BD);
- Preparing BD for clustering task;
- Writing RHadoop code for performing clustering tasks with two clustering algorithms (e.g., k-means and some variant of local density based algorithms)
- Evaluating the clustering algorithms.
The student will create a big data file, store it in DFS and perform and evaluate 2 clustering algorithms with RHadoop on this data.
Project Mentor: Prof. Janez Povh, PhD
Project Co-mentor: Prof. Leon Kos, PhD
Site Co-ordinator: Prof. Leon Kos, PhD
Learning Outcomes:
- Student will master Hadoop and RHadoop;
- Student will master at least two clustering algorithms
Student Prerequisites (compulsory):
- Basics from data management;
- R;
- Basics from clustering.
Student Prerequisites (desirable):
- Basics from Hadoop.
Training Materials:
The candidate should go through the following PRACE MOOC:
https://www.futurelearn.com/courses/big-data-r-hadoop/2/todo/17356
Workplan:
- W1: Introductory week;
- W2: Creating and storing big data file
- W3-5: Coding and evaluating 2 clustering algorithms;
- W6: Creating final video presentations;
- W7: Final report;
- W8: Wrap up.
Final Product Description:
- Created big data files and stored in Hadoop;
- Created RHadoop scripts for clustering;
- Created video on this example.
Adapting the Project: Increasing the Difficulty:
We can increase the size of data or add an additional clustering algorithm to be implemented.
Adapting the Project: Decreasing the Difficulty
We can decrease the size of data or remove one clustering algorithm.
Resources:
RHadoop installation at University of Ljubljana, Faculty of mechanical engineering
Leave a Reply