Container Tools, Tips, and Tricks - Issue #3


Hi friends!

Let's talk about desktop container environments today.

When we say "a container", most of the time, we mean a Linux container in the form that was popularized (and later even standardized) by Docker. Everyone knows that Linux containers are made of namespaces and cgroups, so a Linux kernel is a must to run such a container. But how come these containers also seem to work on macOS and Windows?

Desktop container environment architecture

The answer is simple (unlike the implementation) - they run in a Linux virtual machine. Most, if not all, container runtimes in this amazing list by Bret Fisher that are marked as supporting Windows and/or macOS use a very similar architecture:

The above VM-based architecture became a de facto standard for Desktop container environments. And it makes total sense. It allows the inner piece of software like Docker Engine, containerd, or Podman to run (almost?) unmodified compared to the server-side setup. Yes, desktop container environments bring a whole bunch of extra software to spin up the VM and procure good-enough disk- and network connectivity with the host system (and the outside world). But thanks to the VM at the heart, even on Windows and macOS, the actual containers are executed by pretty much the same set of lower-level components as on your production servers (read - containerd and runc).


​Support this newsletter on Patreon and get advice on technical storytelling and drawing no-boring​ diagrams 😎


Most popular implementations

Let’s take a quick look at how different container runtimes implement this architecture.

Docker Desktop uses a lightweight LinuxKit VM that is run by WSL2 on Windows and, since relatively recently (March 2022), by the Apple Virtualization Framework on macOS (it used to be QEMU before that). While the LinuxKit project is open source, the plumbing of the Docker Desktop is (at least partially) proprietary, so we cannot know all the implementation details. But the bird-eye picture is pretty clear. More details on Docker Desktop architecture (Docker blog).

​Lima packs together containerd, BuildKit, and nerdctl to create something that can be used as a replacement for Docker (rather Engine than Desktop, IMO). It uses good old QEMU to spin up a custom Linux VM where these components will run, and the choice of the virtualization technology (QEMU) stays unchanged between the supported host platforms (Linux, macOS, amd64, arm64).

UPD: Just learned that a couple of weeks ago, the initial support of the Apple Virtualization framework has also landed in Lima, so we may see non-QEMU Lima virtual machines on macOS soon.

​Finch is essentially Lima plus a handy installer (plus AWS-specific integrations if needed). So, no difference from the architectural standpoint, I guess.

​Colima is another Lima-based project. Compared to Finch, Colima follows a different strategy of extending Lima - it adds support for different container runtimes (Docker Engine, containerd, and even Kubernetes via K3s). But from the host system point of view, there is not much difference with Lima's original architecture - all the supported runtimes are fully encapsulated in a Linux virtual machine maintained by Lima.

​Rancher Desktop, at first, may look more similar to Docker Desktop than to Finch or Colima (especially from the GUI/UX point of view). But under the hood, it's yet another Lima application. So, again, pretty much the same architecture. However, compared to Lima's own offer, Rancher Desktop supports running Docker Engine (Moby) and/or K3s in the VM, so it's superior to Lima. But in any case, pretty much the same architecture again.

Running cross-platform containers

All the above Desktop container environments start a virtual machine of the same architecture the host system uses. On my Apple Silicon macbook πšπš˜πšŒπš”πšŽπš› πš›πšžπš— πšžπš‹πšžπš—πšπšž πšžπš—πšŠπš–πšŽ -πš– says πšŠπšŠπš›πšŒπš‘πŸΌπŸΊ, and on the Intel macbook, it says 𝚑𝟾𝟼_𝟼𝟺.

This behavior makes total sense because such a choice allows using hardware virtualization so that the performance of a virtual machine is on par with the host system.

At the same time, it’s possible to run amd64 containers on arm64 desktops and vice versa:

$ docker run -d --platform linux/amd64 nginx
$ docker run -d --platform linux/arm64 nginx

​

But if there is just one virtual machine, how does the cross-platform support work?

TL;DR it's a combination of two technologies: binfmt_misc and CPU emulation in user space. The first one allows registering custom executable formats so that the kernel would know what user space application to invoke when a certain file is about to be executed (similarly to the shebang trick like #!πš™πš’πšπš‘πš˜πš—πŸΉ that we use to run python programs like normal shell scripts). And the second one means there is a special (often QEMU but sometimes Rosetta 2 - proof & proof) helper program that can run arm64 binaries on amd64 or the other way around.

Here is what the process tree looks like for the above two nginx containers from inside of a LinuxKit VM (Docker Desktop on Intel):

$ docker run -it --pid host ubuntu ps auxf

​

And here is the process tree produced by Finch on an Apple Silicon macbook (from inside of the VM, of course):

$ finch run -it --pid host ubuntu ps auxf

​

Short security note

Regardless of my host OS choice, I'd use a separate (often vagrant) virtual machine for every project (or a tightly coupled group of projects). Of course, most of the time, this VM would include a Docker Engine, so if needed, I could run a container from inside of the VM. However, at times, I still need to run a container directly from the host system. Not a big problem on macOS, but on a Linux host, I'd need to run a Docker Engine right on the host system. And that would make me extremely nervous because I generally try to avoid running random stuff from the Internet on my host systems, and containers aren't security devices. Luckily, since May, Docker Desktop can be used on Linux hosts too, so I can benefit from the higher isolation provided by its virtual machine. No more Docker Engine running directly on the host for me ❀️‍πŸ”₯

---

Interesting fact: Did you know that there are Windows(-native) containers too? This type of container has nothing to do with WSL2 (aka lightweight Linux VM on Windows), but nevertheless, it’s an OCI-standard form of containers. Although, I’ve no idea how widespread it is.

​

Hope this was an informative one!

Cheers

Ivan

Ivan Velichko

Building labs.iximiuz.com - a place to help you learn Containers and Kubernetes the fun way πŸš€

Read more from Ivan Velichko

Hello πŸ‘‹ Ivan's here with a slightly delayed September roundup of all things Linux, Containers, Kubernetes, and Server Side πŸ§™ What I was working on This month, I worked on an assorted set of topics. Skill Paths First off, the skill paths! I finally finished the underlying machinery, and now iximiuz Labs supports a new type of content - short roadmaps that you can use to develop or improve a specific skill: how to debug distroless containers, how to copy images from one repository to another,...

Hello friends! Ivan's here with another monthly roundup of all things Linux, Containers, Kubernetes, and Server Side πŸ§™ The issue's main topic is iximiuz Labs' largest-ever upgrade: Fresher and more streamlined look of the frontend UI πŸ’™ A new 5.10 Linux kernel built with nftables support (finally, we can try out kube-proxy's nftables mode). New default playground user - laborant (yep, rootless containers learning for). New playgrounds: Ubuntu 24.04, Debian Trixie, Fedora, and Incus (yay! more...

Hello friends! Ivan's here with a slightly delayed July roundup of all things Linux, Containers, Kubernetes, and Server Side πŸ§™ What I was working on This month, I got nerd-sniped by cgroups. It all started when I ran into a pretty significant difference in how Docker and Kubernetes handle the OOM events. When you limit the memory usage of a multi-process Docker container, the OOM killer often terminates only one of the processes if the container runs out of memory. If this process is not the...