Parallel algorithm for non-negative matrix tri-factorization

Project reference: 1720

In computational biology we typically have p≥3 sets of data points P1,P2,…,Pp (e.g., sets of different biological objects including proteins, genes, drugs, diseases etc. with |Pi |=ni and for each pair i < j we have a data matrix Dij containing relations between data points from Pi and Pj (intertype connections, e.g., drugs-genes), while for i = j the data matrices Dii contain intra-type connections (e.g., gene-gene, drug-drug interactions).

A way to mine all these data sets instantly is to solve non-negative matrix tri-factorization problem, which can be formulated as follows:

Quite some effort has been devoted to solve this hard optimization problem, mainly using Fixed Point Method. We will use best existing academic code and parallelize it within C++ and OpenMPI and test it on a local HPC machine.

Results will be tested on real data from biomedicine.

Non-Negative Matrix Tri-Factorization NMF applied to data associating genes, diseases and drugs can help (i) reconstructing data matrices Dij, (ii) co-clustering the data sets, and (iii) detecting new associations (triangles of connections) between the data sets.

Project Mentor: Prof.JanezPovh, Ph.D.

Site Co-ordinator: Leon Kos, Ph.D.

Learning Outcomes:

  • Fixed point method to solve optimization problems;
  • C++ and openMPI programming on HPC

  Student Prerequisites (compulsory): 

C++ coding

Student Prerequisites (desirable): 

Basics from mathematical optimization or data science

Training Materials:

See attached paper:

  1. N. Pržulj, N. Malod-Dognin, Network analytics in the age of big data. Science, 353(6295):123–124, 2016

Workplan:

  • Week 2: study of the method and design of the development environment
  • Week 3: programming the method.
  • Week 4: testing and improving the code
  • Week 5: testing the method on real data
  • Weeks 6-7: writing the final report and the presentation
  • Week 8: Wrapping up.

Final Product Description: 

Final result will be used within a research project that is already running at the hosting institution and will be incorporated in the dissemination activities of this project

Adapting the Project: Increasing the Difficulty:

There are several extension of the underlying optimization problem which we can start working on.

Resources:

The student needs basic knowledge in C++. During the project he will get access to the local HPC machine.

Organisation:
University of Ljubljana
project_1620_logo-uni-lj

Please follow and like us:
Posted in Projects 2017 Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*