Container Tools, Tips, and Tricks - Issue #4


When Containers Are Not Enough

Believe it or not, containers are virtualization means. Even Linux containers that are ā€œjust isolated and restricted processesā€ can make a single server look like a hundred independent ā€œmachinesā€ with their own network stacks and filesystems. And this is, by definition, virtualization.

Having a container per application is handy - you can choose a Linux flavor that suits your needs the best, install the applicationā€™s dependencies without fear of clashing with the neighbors, and enjoy the subsecond startup time, thanks to the ā€œshared kernelā€ architecture.

However, sometimes, the virtualization provided by Linux containers may be too limiting. For instance, from time to time, I need to access Docker from within a container, but neither mounting the hostā€™s docker.sock file into the container nor running Docker in Docker (aka dind) sounds good enough to me (because of security and performance implications). Another typical example is when extra boundaries (beyond namespaces, cgroups, and seccomp profiles) are required to protect the host from the workloads and the workloads from each other.

A solution that not only looks like providing a ā€œmachineā€ per application but truly creates these "machines" might be much more preferable in cases like the above.

Instead of relying on OS-level virtualization means, as Linux containers do, our ideal tool needs to be virtualizing the actual hardware where a separate Linux kernel (and maybe the rest of the operating system) can be booted. And thatā€™s exactly what good old virtual machines do. But we got used to almost instant startup times of our containers, won't the virtual machines be too slow for us?

Turns out, some virtual machine monitors are faster than others!

Cracking VM performance

ā€‹Firecracker looks like a good option if you need to run virtual machines that boot (almost) as fast as containers. The official starting guide is fairly straightforward, and Alex Ellis also made his own version of the starting guide showing additionally how to configure VM networking. Long story short, you need to get an uncompressed kernel binary and a (disk image of the) root filesystem, start the firecracker process, and point it to the said files using the HTTP API it exposes.

I was able to complete the guide from the first attempt without much trouble:

The feeling that I could have a bunch of Ubuntu (micro)VMs up and running in no time was just amazing. And at first sight, they even worked fineā€¦

But then I tried running Docker inside one of the VMs, and it wouldnā€™t start. The pity is that I couldnā€™t even check the systemā€™s compatibility because CONFIG_IKCONFIG wasnā€™t enabled in the sample kernel.

Apparently, the provided sample kernel binary is pretty old (4.14.x IIRC), and was compiled using a firecracker-optimized set of configs that are tailored for serverless workloads.

My first thought emotion was to figure out the right set of kernel configs myself. It turns out compiling a kernel is a simple task! Especially if you use a helper builder container:

# syntax=docker/dockerfile:1
FROM ubuntu:20.04 as builder

RUN <<EOF
set -eu

apt-get update
apt-get install -y bc bison build-essential \
  ccache flex gcc-7 git kmod libelf-dev \
  libncurses-dev libssl-dev wget ca-certificates

update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 10
EOF

From within the above container, you can build your own kernel with something like this:

git clone \
  --depth 1 \
  --branch v5.10.77 \
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git \
  linux

cd linux

# Copy your tweaked config to .config

make clean mrproper
make olddefconfig

# Build the kernel
make -j$(nproc)

# Your kernel is at ./vmlinux

But even though the above snippet takes just a couple of minutes on my moderately performant server (Intel Core i7-8700 CPU @ 3.20GHz), my kernel knowledge (or rather lack of it) didnā€™t allow me to figure out the right set of configs within a reasonable number of attempts. And even if I would come up with a good enough kernel build, while simple, the original Firecracker UX is still pretty far away from the convenience of docker run.

Igniting microVMs with ease

Luckily, folks from Weaveworks have already figured everything out! The magical Weave Ignite project makes launching Firecracker microVMs as smooth as Docker containers.

Weave Ignite is a relatively thin wrapper (~20K lines of Go) around Firecracker that comes bundled with a set of precompiled kernels (at the time of writing this, the version list includes 4.14.x, 4.19.x, 5.4.x, 5.10.x, 5.14.x, and more) and root filesystems (Ubuntu 20.04, CentOS 8, Amazon Linux 2, K3s, etc). Kernels are based on the (already familiar to us) firecracker-optimized configs but with Weaveworks-authored patches applied on top to allow running tools like Docker and K3s inside of ignite-started microVMs.

Both prebuilt kernels and root filesystems are conveniently packed as OCI images and stored on DockerHub (but you can build and import your own if you like).

ā€‹Installation of the tool is relatively straightforward (using a bare-metal machine is a good idea but nested virtualization may also be an option):

  • Ensure the dependencies (apt-get install -y containerd dmsetup ...).
  • Download the CNI plugins.
  • Download the ignite and (optional) ignited binaries.

After you have ignite somewhere in your PATH, starting a microVM becomes as simple as:

# Pull in the right version of the kernel.
$ ignite kernel import weaveworks/ignite-kernel:5.10.77-amd64

# Pull in the rootfs of choice.
$ ignite image import weaveworks/ignite-k3s:latest

# Start the microVM.
$ ignite run weaveworks/ignite-k3s:latest \
  --kernel-image weaveworks/ignite-kernel:5.10.77-amd64 \
  --name my-vm \
  --cpus 2 \
  --memory 4GB \
  --size 10GB \
  --ssh \
  --interactive

One of the cool things about Ignite is how it leverages containers and the surrounding ecosystem. Not only rootfs and kernel images are stored and distributed as container images, but also containers themselves are used to run microVMs! For every ignite run (which is, much like docker run, just a shortcut for ignite create followed by ignite start), Ignite starts a sandbox Alpine container (using a local containerd daemon) that runs a special ignite-spawn binary. The ignite-spawn process serves as a launcher of the firecracker process that will represent the future VM (once it receives all the configs via the HTTP API it exposes).

Interesting that the firecracker jailer is not used by Ignite. The jailer is supposed to be restricting the firecracker processes even further by running it as a non-root user and using a tight seccomp profile. The ignite-spwan process seems to be running as root and in a quite privileged container (ctr -n firecracker c info ignite-081d6a7249aed6dc shows that CAP_SYS_ADMIN is used), so this design choice is rather questionable. Nevertheless, having a disposable container around the firecracker process is handy for garbage collection - no need to care about various filesystem and networking leftovers when the VM terminates.

Here is what the process tree looks like on the host:

$ ps axfo pid,ppid,user,command
   PID    PPID  USER  COMMAND
   ...
238567       1  root  /usr/bin/containerd-shim-runc-v2 -namespace firecracker -id ignite-03922f0748b8e931
238588  238567  root   \_ /usr/local/bin/ignite-spawn --log-level=info 03922f0748b8e931
238674  238588  root       \_ firecracker --api-sock /var/lib/firecracker/vm/03922f0748b8e931/firecracker.soc

Using microVMs in the wild

Ok, itā€™s all fun, but you may rightfully ask, ā€œWhat am I supposed to do with this knowledge?ā€

Iā€™m a big fan of VM-based disposable and isolated dev environments and playgrounds. Traditionally, Iā€™ve been using VirtualBox/Vagrant for that. But VirtualBox is pretty heavy-weight. Itā€™s fine when itā€™s a longer-term project, but it creates friction for quick experimentation. With Ignite, though, you can get a full-blown VM in under a second (assuming the images have already been pulled), isnā€™t it just amazing? You can ssh into it, install every tool you need, break stuff as much as you want, and then just tear it down, leaving your host system clean and tidy.

Wanna keep it more boring real? You can use Ignite in your CI/CD to make it more reproducible and secure! Weaveworks folks claim itā€™s designed to be a ā€œGitOps-firstā€ project (remember this second ignited binary - itā€™s a reconciler).

And, of course, you can bake your own rootfs images containing all the tools you need - with Docker, itā€™s as simple as writing a Dockerfile and then building it to a folder using docker buildx build -o rootfs. Look how neat this Igniteā€™s Ubuntu + K3s example.


Fun fact: I wrote a blog post about this technique back in 2019 - little did I know that itā€™s used in the wild - the accompanying GitHub project even gained a few hundred stars since then.


Last but not least, even if Ignite is not directly suitable for your needs (it also looks a bit unmaintained at the moment), you still can learn from it! For instance, I use it as an inspiration and a source of ideas when Iā€™m working on my learn-by-doing platform:

twitter profile avatar
Ivan Velichko
Twitter Logo
@iximiuz
January 13th 2023
118
Retweets
468
Likes

Ivan Velichko

Building labs.iximiuz.com - a place to help you learn Containers and Kubernetes the fun way šŸš€

Read more from Ivan Velichko

Hello šŸ‘‹ Ivan's here with a slightly delayed September roundup of all things Linux, Containers, Kubernetes, and Server Side šŸ§™ What I was working on This month, I worked on an assorted set of topics. Skill Paths First off, the skill paths! I finally finished the underlying machinery, and now iximiuz Labs supports a new type of content - short roadmaps that you can use to develop or improve a specific skill: how to debug distroless containers, how to copy images from one repository to another,...

Hello friends! Ivan's here with another monthly roundup of all things Linux, Containers, Kubernetes, and Server Side šŸ§™ The issue's main topic is iximiuz Labs' largest-ever upgrade: Fresher and more streamlined look of the frontend UI šŸ’™ A new 5.10 Linux kernel built with nftables support (finally, we can try out kube-proxy's nftables mode). New default playground user - laborant (yep, rootless containers learning for). New playgrounds: Ubuntu 24.04, Debian Trixie, Fedora, and Incus (yay! more...

Hello friends! Ivan's here with a slightly delayed July roundup of all things Linux, Containers, Kubernetes, and Server Side šŸ§™ What I was working on This month, I got nerd-sniped by cgroups. It all started when I ran into a pretty significant difference in how Docker and Kubernetes handle the OOM events. When you limit the memory usage of a multi-process Docker container, the OOM killer often terminates only one of the processes if the container runs out of memory. If this process is not the...