Switching things up with the Neural Compute Stick!
As I elaborated in my last two blogposts on what the Intel Neural Compute Stick is and how to use it in combination with the OpenVino toolkit, I will describe in this blog post what the “dynamic” part in the title of my project stands for. As explained in my first blogpost, the Neural Compute Stick is meant to be used in combination with lightweight computers such as the Raspberry Pi to speed up computations for visual applications. Now often enough devices such as the Raspberry Pi are found on the edge. This could be a device attached to satellites in space running earth observation missions, or an underwater robot conducting maintenance on critical structures such as underwater pipelines. One problem that remains though, even with the Neural Compute Stick accelerating computations, is that once a model is deployed on an edge device, it’s hard to repurpose the device to run a different kind of model. This is where the “dynamic” part of the title of my project comes into play. The Neural Compute Stick is a highly parallelized piece of hardware, with twelve processing units independently running computations. This way, it is possible to not only have one, but several models loaded into its memory.
Even more so, it is possible to switch these models and load new models into memory. This allows to adapt to new situations in the field, like when the feature space changes or the things that are supposed to be detected change. The simplest case of such an occurrence might be the sun setting or bad weather conditions coming up. Another motivation to switch models might also be to save power, as edge devices tend to have limited capacities if it comes to energy sources. Instead of deploying one big model that is supposed to cover all cases that could occur in a production environment, it would be possible to have many small models that could be loaded in and out of memory at runtime.
With this in mind, I went ahead and investigated the feasibility of doing so and implemented a small prototype that switches models at runtime. For this prototype I used two models detecting human bodies and faces and had the prototype switch between these two models. These models are both so called single shot detector MobileNets, networks that are better suited to be deployed on lightweight devices such as the Raspberry Pi. These networks localize and classify an object in a single pass through the network and draw bounding boxes around objects they detect in it.
I used OpenCV for this task, which is a library featuring all sorts of algorithms for image processing and is best described as “swiss army knife” if it comes to visual applications. Next to OpenCV I had OpenVino running as a backend to utilize the Neural Compute Stick in my application.
I eventually tested this model switching prototype by loading and offloading models in and out of memory of the Neural Compute Stick. I did this with a very high frequency of one switch per frame to determine what the latency of such a model switch would be in a worst-case scenario. The switching process includes reading the input and output dimensions of a model by using the XML representation of its architecture and then loading it into the memory of the Neural Compute Stick. On average this switch caused an extra overhead of about 14 percent of the overall runtime. To put this into perspective, on average it took my application half a second to capture and generate an output for an image, whereas a model switch in between would add a little less than a tenth of a second to this time. Of course, there is a lot of room for improvement given these numbers. One such improvement would be concerned with the parsing of the model dimensions. I used a simple XML parser to do so and had to read in the input and output dimensions of a model on every switch. Doing this once for all models that potentially will be used on the Neural Compute Stick when the application starts running and saving the dimensions into a lookup table could cut the switch time almost in half. Further speedup of this switch could be achieved by conducting it asynchronously, as while the model is loaded onto the Neural Compute Stick the next frame can already be capture instead of waiting for the switching process to finish.
All in all, I found that although at the current state this prototype would not be applicable to real time applications yet, given the potential for improvement it could get there. Yet if no hard conditions are imposed for it to perform in real time as is the case for many applications, it is deployable already.
With this I would like to sum up my findings on this project, if you would like to learn more about this project feel free to have a look at my blog on the website of the PRACE Summer of HPC 2019. Lastly, I would like to thank my supervisors for their amazing support throughout this whole project and in general the staff at ICHEC for welcoming me and making this stay such a great experience!