The Case of the Missing Device Function
My project has been progressing quite well and a bit more of that in the next blog post. However, this time I present a more detailed description of the beginning of my project. The narrative may have been dramatized in some points.
Business as usual
The clock’s drawing close to midnight, the rain is battering against the office window and the scent of fresh coffee is floating in the air. Usually, this sort of atmosphere would be relaxing, but unfortunately the cup of coffee is the last one I can afford for now. I’ve been without a case for weeks and apparently I’ll have to sell my trusty office chair to satiate my caffeine addiction. I hear that the stand-up working style isn’t too bad though, it’ll keep your legs healthy and posture proper.
The name’s Mikkonen. I’m the current owner of the not-so-successful ZZX Jülich Detective Agency. My specialization is missing HPC implementations of algorithms. Business hasn’t been too good lately, mostly because of all these new competing faces on the scene and because libraries like OpenCL make it too easy to write parallel code for all sorts of platforms. It’s ever more difficult for a man to make do with this profession in this corner of the world. I wonder if competitive coffee drinking is a thing? Or perhaps exotic table-dancing?
All of a sudden there’s a knock on the door. What a peculiar hour for an insurance salesman to bother me. “Come on in!”, I yell reluctantly. As the slowly-opening door creaks, Damn, I need to oil the hinges again, in walks a young woman in a red dress. She’s about in her mid-twenties and her brown, slightly curly hair extends past her shoulders. Her distress is obvious. However, her HIPs instantly capture my attention.
“Have a seat. How can I help you, Miss…?”
“R9 Fury Radeon.”
“I have a problem, Mr. Mikkonen”, she began telling her tale, “and you’re my last hope.”
Quite intriguing and alarming, at the same time.
“I’ve contacted many agencies but none of them wanted to take up my task. You’re the last one on my list. It all started two weeks ago…
Oh, blast, now that I think about it, my agency literally is the last one on the phone book. Maybe that’s why business has been so slow? I’d need to change the name, but what about the memory of Zacharias “Zulu” Xavier, my late mentor, who tragically lost his life in a freak compiling accident…? Oh crap, I’ve got to listen to her.
…and thus I desperately need an extremely fast Coulombic force solver.”
“I’m not too familiar with n-body solvers. How do you expect me to come up with the code?”
“I’m a graphics card from a renowned family, AMD, but lately software developers have focused mostly on our rivals, the Nvidia.”
Sorting out family feuds? Not my favourite kind of gig.
“Not all of their work is in vain, though.”
She sets down a small piece of paper onto my desk.
“Here are the access codes to a Git repository with Fast Multipole Method solver written with CUDA. Don’t worry, they’re perfectly legally acquired. I want you to clone it and transform it into something I could use.”
How about AM Investigations, or Antti’s Covert Surveillance…
“Oh, yes! I was already thinking of a possible parallelization scheme! I’ll accept this task. Sounds like a fascinating challenge!”
It’s not like I have too many choices here, if I want to avert a coffee-less future.
“Thank you! You’re a brave man, Mr. Mikkonen.”
“Just leave me your contact details, I’ll get in touch with you when there’s been a development.”
“Of course, here’s my card. Report to me as soon as you come up with anything!”, R9 Fury uttered as she hurried off, out of my office.
My hunch tells me that there’ll be some major hindrances which she conveniently forgot to mention. The card she gave me reads:
Ms. Radeon, R9 Fury
Jurock, room 138, JSC,
Hmm, Jurock. That certainly sounds like a proper computation cluster, but it’s located in someone’s office, not in the big machine hall. Her family certainly isn’t doing well at all.
Well, no use just sitting around, there’s coding to be done! My sluggish but trusty personal computation machine boots up and a Git login prompt appears. Let’s see, username: johannes.pekkilae… I peer into the CUDA FMM repositry and the source code is jumbled with loop unrolls, preprocessor macros and lots of template magic! A headache emerges just from looking at it. It’s going to take me weeks to port this!
Pondering my next step, my thoughts drift back to that woman. The woman, and her HIPs. Of course, the HIP, Heterogenous Interface for Portability! It’s a collection of tools which enable the portable development of GPU code. Because of the prevalence of CUDA, HIP contains scripts with which one can ‘hipify’ their code, converting it from CUDA C into more general HIP C++. The perfect utility for this situation!
Except, when it’s not. CUDA function __ldg wasn’t found from the HIP framework yet and the conversion failed. No matter, function basically just speeds up data reading by using the read-only data cache. By changing that into utilizing global memory, we’ll get a working code. We don’t get the performance benefits now, but optimizations can be done afterwards.
After a successful source code translation, it’s time to compile! Compilation is done with hipcc, a perl script which calls nvcc or hcc compilers with the appropriate parameters depending on the target infrastructure. Easy as A-M-D! Wait, “error: expected an expression”, “error: identifier is undefined”, “error: no instance of function template”. It appears that the script’s still in development. Well, at least the compiler itself didn’t experience a segmentation fault this time. This night is going to be long, but quite interesting.