GPU acceleration of Breadth First Search algorithm in applications of Social Networks

GPU acceleration of Breadth First Search algorithm in applications of Social Networks
Caption 1: Breadth First Search discovers distances to nodes* Caption 2: A Social Network Visualisation * D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press, 2010.

Project reference: 2008

A Breadth First Search (BFS) is one of the core graph based searching algorithms. It can run in O(N + E), where N is the number of vertices and E is the number of edges of the graph. Due to the irregular memory access patterns and the unstructured nature of the large graphs, its parallel implementation is very challenging. Several parallel programming approaches were implemented in the literature. Especially, the use of GPUs has been an interesting area of research in terms of exploiting the architectural features such as high throughput and use of shared memory

In this project, we first aim to implement a BFS in CUDA using vertex-centric approach, i.e., one thread per vertex in the graph. We will then look into an edge-centric CUDA implementation. Following on, we will analyse the code in the NVIDIA Visual Profiler and investigate how we can improve the performance by using novel approaches such as CUDA dynamic parallelism, Hyper-Q or NVSHMEM to benefit from the architecture.

Upon achieving sufficient usefulness,  functionality, and performance, the code can be associated with Social Networks where each vertex correspond to a different person, and each edge represents a friendship between two people. The question of interest would be then finding people’s degree of separation from each other.

Caption 1: Breadth First Search discovers distances to nodes*
Caption 2: A Social Network Visualisation
* D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press, 2010.

Project Mentor: Buket Benek Gursoy

Project Co-mentor: /

Site Co-ordinator: Simon Wong

Participants: Busenur Aktilav, Berker Demirel

Learning Outcomes:

  • Use of an HPC system and profiling of an application.
  • Writing clean and reusable C for use in other HPC projects.
  • Solving problems using GPU computing and Code profiling.
  • Exposure to Social Networking Graph Search Algorithms.

Student Prerequisites (compulsory)
Intermediate C, Basic familiarity with programming in CUDA.

Student Prerequisites (desirable):
Familiarity with GPU technology.
Familiarity with NVIDIA Profiling Tools.
Familiarity with Search algorithms.
Familiarity with Social Networking Graphs.

Training Materials:

BFS Code Examples:
https://github.com/maxdan94/BFS
https://github.com/bamfbamf/bfs-cudahttps://github.com/onesuper/bfs_in_parallel

Related Publications:

  1. P. Harish and P. J. Narayanan, Accelerating large graph algorithms on the gpu using cuda, In Proceedings of the 14th International Conference on High Performance Computing, HiPC’07, pp. 197-208, Berlin, Heidelberg, 2007, Springer-Verlag.
  2. C. D. Pise and S. W. Shende, Parallelization of BFS Graph Algorithm using CUDA, IJCAT International Journal of Computing and Technology, Volume 1, Issue 3, April 2014.
  3. M. Springer, Breadth-First Search in CUDA, Tokyo Institute of Technology

Workplan:

  •  Week 1: Training week
  • Week 2: Overview of the project; Familiarising with the HPC system at ICHEC; Introduction to social networking graph algorithms and C and CUDA concepts; Learn about BFS; Write project plan
  • Week 3: Serial implementation of the BFS algorithm; Learn about its parallelisation approaches on GPUs.
  • Week 4: CUDA implementation of the BFS algorithm.
  • Week 5: Profiling and Improvements to CUDA Code.
  • Week 6: Applications to Social Networks.
  • Week 7-8: Write Report, Prepare Video Presentation for PRACE and Presentation at ICHEC

Final Product Description:
A clean parallelised code of the BFS algorithm in CUDA; Profiling the CUDA code and Potential application to Social Networks.

Adapting the Project: Increasing the Difficulty:
The student could attempt to implement advanced optimisation of the CUDA code using approaches such as dynamic parallelism or NVSHMEM.

Adapting the Project: Decreasing the Difficulty:
The student could stop at implementing after the CUDA accelerated BFS, or anywhere along that path, and finalise with the profiling.

Resources:
A computer (The student should bring their own.)
Access to HPC system (ICHEC’s HPC system, Kay, will be made available.)

Organisation:
Irish Centre for High-End Computing

Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.