Big Data clustering with RHadoop

Big Data clustering with RHadoop

Project reference: 1822

The project will consist of:

  • Getting to know Hadoop and RHadoop;
  • Creating and storing big data files (BD);
  • Preparing BD for clustering task;
  • Writing RHadoop code for performing clustering tasks with two clustering algorithms (e.g., k-means and some variant of local density based algorithms)
  • Evaluating the clustering algorithms.

The student will create a big data file, store it in DFS and perform and evaluate 2 clustering algorithms with RHadoop on this data.

Project Mentor: Prof. Janez Povh, PhD

Project Co-mentor: Prof. Leon Kos, PhD

Site Co-ordinator: Prof. Leon Kos, PhD

Learning Outcomes:

  • Student will master Hadoop and RHadoop;
  • Student will master at least two clustering algorithms

Student Prerequisites (compulsory):

  • Basics from data management;
  • R;
  • Basics from clustering.

Student Prerequisites (desirable):

  • Basics from Hadoop.

Training Materials:
The candidate should go through the following PRACE MOOC:


  • W1: Introductory week;
  • W2: Creating and storing big data file
  • W3-5: Coding and evaluating 2 clustering algorithms;
  • W6: Creating final video presentations;
  • W7: Final report;
  • W8: Wrap up.

Final Product Description:

  • Created big data files and stored in Hadoop;
  • Created RHadoop scripts for clustering;
  • Created video on this example.

Adapting the Project: Increasing the Difficulty:
We can increase the size of data or add an additional clustering algorithm to be implemented.

Adapting the Project: Decreasing the Difficulty
We can decrease the size of data or remove one clustering algorithm.

RHadoop installation at University of Ljubljana, Faculty of mechanical engineering

University of Ljubljana

Please follow and like us:
Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.