Big Data clustering with RHadoop

Big Data clustering with RHadoop

Project reference: 1822

The project will consist of:

  • Getting to know Hadoop and RHadoop;
  • Creating and storing big data files (BD);
  • Preparing BD for clustering task;
  • Writing RHadoop code for performing clustering tasks with two clustering algorithms (e.g., k-means and some variant of local density based algorithms)
  • Evaluating the clustering algorithms.

The student will create a big data file, store it in DFS and perform and evaluate 2 clustering algorithms with RHadoop on this data.

Project Mentor: Prof. Janez Povh, PhD

Project Co-mentor: Prof. Leon Kos, PhD

Site Co-ordinator: Prof. Leon Kos, PhD

Learning Outcomes:

  • Student will master Hadoop and RHadoop;
  • Student will master at least two clustering algorithms

Student Prerequisites (compulsory):

  • Basics from data management;
  • R;
  • Basics from clustering.

Student Prerequisites (desirable):

  • Basics from Hadoop.

Training Materials:
The candidate should go through the following PRACE MOOC:


  • W1: Introductory week;
  • W2: Creating and storing big data file
  • W3-5: Coding and evaluating 2 clustering algorithms;
  • W6: Creating final video presentations;
  • W7: Final report;
  • W8: Wrap up.

Final Product Description:

  • Created big data files and stored in Hadoop;
  • Created RHadoop scripts for clustering;
  • Created video on this example.

Adapting the Project: Increasing the Difficulty:
We can increase the size of data or add an additional clustering algorithm to be implemented.

Adapting the Project: Decreasing the Difficulty
We can decrease the size of data or remove one clustering algorithm.

RHadoop installation at University of Ljubljana, Faculty of mechanical engineering

University of Ljubljana

Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.