Can’t `$ touch this`: Containerisation on permissioned HPC systems using Singularity
In my previous blog post, I discussed the advantages of containerisation when it comes to reproducibility. In short, if you package all of the tools necessary to run your experiments, you can ensure that in future, folk can rerun your systems with minimal hassle.
Since then, I have got access to an HPC system within the Barcelona Supercomputing Center. There, the current design of Docker does not really work, as it requires root access. In a large shared HPC system, granting root permissions to every Tom, Dick, and Perry is not best practice for security, or scheduling fairness.
Thus, I had to look for an alternative. I needed a process of containerisation which would support a shared HPC system, including it’s workload management system such as LSF or SLURM. It should also be easy to integrate with my work so far.
A containerisation system that is HPC-first. Until it was recommended by one of the BSC sysadmins, I had never heard of it before. It integrates well with existing Docker work, allowing you to import existing Docker images, as well as making Singularity-native ones. It also offers a number of domain specific advantages.
In this post, I will walk through a minimum working example of creating a Singularity based experimental workflow.
On our HPC system, we will not have root access. However, the creation of images, and processes such as the installation of packages generally does. Thus, in Singularity one should build and debug images locally, and then copy them over to the HPC system for execution. When testing, you should be sure that your experiments can run without root access, and be wary of where output files are written to.
This a post for people who just want to be able do the following:
- Build their code in a container, for reproducibility.
- Run that code on a HPC system, for which they don’t have admin privileges.
It contains all of the information I wish I had in one place when I started this work.
Singularity and containerisation is a massive topic, so I will focus on a small workflow. For more complete descriptions of each feature, I advise you use the official docs.
- An HPC system with Singularity installed (check the available modules, or talk to your sysadmin)
- A system for which you have root access, for developing your container
- Some code you want to run (provided, along with everything described in this post)
First, you need to install Singularity locally (somewhere you have root access):
git clone https://github.com/singularityware/singularity.git cd singularity git fetch --all git checkout 2.6.0 ./autogen.sh ./configure --prefix=/usr/local make sudo make install
Building our first image
To describe and build our container, we use a simple text-file, which we will call the “container recipe”.
Here, you define:
- The base image to use for your image (e.g. Debian, Ubuntu, Alpine, some other operating system, or even a more fully developed image that you want to build on top of).
- The files you want to copy into the image.
- The setup process, of installing necessary packages, building your code, etc.
- The different “apps”, or “run scripts”, you want your container to perform. For example, your container could have a couple of different experiment modes that could be run. This provides a simple front-end for users of your container.
If you look at
mwe.def, you will see we use the Debian Docker base image:
Bootstrap: docker From: debian:latest
We then copy a number of directories to
/root/, including our application code:
%files setup /root/ application /root/ run /root/
When these directories have been copied, we run our setup and build scripts:
%post bash /root/setup/setup_script.sh bash /root/setup/build_app.sh
Let’s build our image!
sudo singularity build --sandbox mwe.img mwe.def
You’ll notice that instead of images being stored in
/var/lib like Docker, you will have all of your files dumped into your working directory. That’s how it is. Plan your
When we are designing our experimental workflow, it is natural to make changes, and explore what sequence of commands are needed to get things working. Thus, on your local machine, you will want to build sandbox images (notice the
--sandbox flag we used. This means that you can connect to the container’s shell, and figure out what commands you need to run in the workflow.
These changes are ephemeral by default, which is good. Any packages you install, or build commands you issue which you are happy with should by added to your container recipe.
When you’re running your experiments, it’s usually simpler to invoke things with a single script, and specify an output directory for results. Issuing the same 14 commands every time is just busy work.
Personally, I recommend at least having a “quick script”, and a “full script”. The quick script should be a minimal version of your experiment, and ideally finish in a short period of time (hence “quick”). The purpose of this is not to collect data, but to test that your experimental workflow is working correctly.
Like Docker, Singularity supports this with the
run flag. You can get the default run behaviour of a container with:
mwe.def, I have set this to print our run options for this container.
%runscript exec echo "Try running with --app quick_run|full_run, and specify an output directory"
For example, to run our quick experiment, run like so:
For example, to run our quick experiment, run like so:singularity run –app quick_run mwe.img ~/results/
Notice that we didn’t need to use
sudo for running the container. It is easy to submit this to job processing systems like SLURM and LSF.
In our Singularity recipe, we can define our run options:
%apprun quick_run exec bash /root/run/quick_script.sh "$@" %apprun full_run exec bash /reprod/full_script.sh "$@"
The syntax for arguments is the same as normal bash, so
$@ represents all arguments,
$1 the first,
$2 the second etc.
Note that we passed as an argument
~/results. This is a path in our host filesystem outside the container. This is an important point.
By default, your home directory,
/dev filesystems from the host OS are mounted inside the container. So scripts inside the container can write to these locations, with the same permissions of the user. If you run container without root permissions, you will find that most of the file-system will be inaccessible to you. This is by design, which is why you should prefer to have output directories in the auto-mounted directories. You can also manually mount directories, however this is often disabled. See the docs for more information.
Moving to our HPC system
.simg file type, without
--sandbox signals to Singularity to make the image a compressed read-only filesystem. This is ideal for our needs, as it reduces filesizes.
mwe.simg to your HPC system. You might want to go for a coffee – even when compressed, containers are not the most space-efficient systems.
You can now try running the quick script, to test things are working. Make sure the Singularity module is activated in the environment, if that’s how your HPC system is set up. Hopefully all is well, and you don’t get any permissions issues. If you do, make sure that your code is not trying to write to any locations that are not mounted, and you have permission to write to.
Finally, we can try the full experiment:
sudo singularity build mwe.simg mwe.def
Try submitting to your HPC system’s job scheduler.
Hopefully this post serves as a good introduction to making reproducible experiments for HPC. You won’t incur much of an overhead running in a container, so I really encourage you to consider doing this more often.
Final things you should consider:
- how does the architecture of your host machine differ from your target system. Plan compilation flags accordingly.
- what specific package versions are essential to your experiment? You can’t trust that package repositories will be around forever. If your experiment absolutely needs a particular version of a tool, it is better to download a copy of the package, and install it from file.
- How are you going to store the project image, and related files? Results can’t be reproduced if people can’t access them.