Why is high performance computing so good for image analysis?
My Summer of HPC project was officially titled “Development of a sample application in PyCOMPSs”. As you can probably tell it’s a pretty vague title, and I was given a lot of freedom to come up with my own project, which I’ll discuss a bit at the end, and also explain what PyCOMPSs is.
The general theme of my project is high performance computing applied to image processing.
High performance computing, HPC, refers generally to a computing practice that aims to efficiently and quickly solve complex problems. The main tenet of HPC I am focusing on is Parallel Computing. This is where multiple computer processing units are used simultaneously to perform a computation or solve a task. Most applications are written so called “sequentially”, this is where the computations of the program happen one after the other. There are some tasks however where the order of computation may not matter, for example if you wanted to sum up the elements of two separate lists. It doesn’t matter which you sum up first. If you could theoretically do both the summations simultaneously then you’d theoretically get a two times speed up of your application. This is the idea behind parallel computing. Supercomputers and computer graphics cards have thousands of computing units which allows them to run highly parallelized code. The Barcelona Super Computing Center has it’s own super computer called MareNostrum, it has almost 50’000 multicore Cpu’s!
Image processing and Image analysis are about the extraction of meaningful information from images. Images on computers are represented by a matrix of so called pixels, the width*height of this matrix is the resolution. These pixels contain information about the amount of red, green and blue in the image at that point. Image processing and analysis is actually a task highly suited to High performance computing and parallel processing.
I’ll give some examples, when processing an image you might only want to consider a subset of the image, let’s say you are searching for a face in the image. The face will only take up a small subset of the image so to detect it you need to be focused on that subset, or image window. Sequentially you could iterate over the image and process each subset at a time, but you could also use parallel computing to process each subset simultaneously and get big speedups linear to the number of subsets. This approach is known as the sliding window technique. More to be found at this excellent blog post, http://www.pyimagesearch.com/2015/03/23/sliding-windows-for-object-detection-with-python-and-opencv/
The face might not always appear as the same size in the image however, to find it you can process the image at different scales. With parallel computing you can process the image at different scales in parallel, then process the windows also in parallel. For each image scale you take the window with the highest probability of a face detection, then you take the window with the max probability over all the scales. This is called using an image pyramid.
For many machine learning applications, thousands of images are processed, each image can be processed in parallel also.
My project is a combination of all mentioned above. First I’ll mention the inspiration for my project. The fantastically titled paper “What makes paris look like paris?” by Dr. Carl Doersch, found here http://graphics.cs.cmu.edu/projects/whatMakesParis/ . This paper uses image processing and parallel computing to process thousands of randomly chosen images from Google StreetView in Paris and other cities to automatically extract visually coherent and distinctive images of paris, like parisian style balconies, windows and and street signs.
My goal is to develop a similar system in Python for high performance computing, but applying the techniques to images of castles in order to extract the uniquely castle like features like arrow slits, battlements, big gates and towers.
The PyCOMPSs system I will be using is a programming model which aims to ease the development of applications for distributed infrastructures, such as Clusters, Grids and Clouds. I will be deploying my application on the Mare Nostrum super computer.
More posts on my trials and tribulations to come!
Also a sneaky bonus pic of the excellent hiking I did in the Pyrenees.