Let's continue on the topic of Desktop Container Environments. This issue will focus specifically on running cross-platform containers:
There are different types of containers, but the most widespread type is Linux containers. In fact, they are so predominant that people usually omit the Linux part of the name when referring to them. Running such a [Linux] container on macOS or Windows requires a virtual machine - simply because only a real Linux kernel can provide the container runtime with the required building blocks like namespaces and cgroups. Even on Linux, using a separate VM might be a good idea to isolate containers further from the host, especially when the host system is your personal laptop. Provisioning such a service VM is the responsibility of the Desktop Container Environment - that's why Docker, Rancher, Podman Desktops, Lima, and OrbStack all implement very similar architecture:
If you stare at the above diagram long enough, you may notice that QEMU is mentioned there twice - as a VM creation means and as a mysterious CPU emulator. Differentiating between these two QEMU modes is very important if you want to form a holistic understanding of the domain.
Forgetting about containers and VMs for a second, if you try running an ARM64 binary on an AMD64 Linux machine, most likely it'll fail with an error like "cannot execute binary file: Exec format error." It happens because the system doesn't understand the instructions from the ARM64 binary. However, there is a clever way around it that doesn't involve the "expensive" emulation of a full-blown ARM64 machine - translating the ARM64 instructions into AMD64 instructions while (or shortly before) executing the binary.
QEMU is not a single tool but rather a diverse collection of programs, and in particular, it has a family of commands known as qemu-user that can perform translations of a foreign instruction set into a native one:
$ cat > main.go <<EOF
> package main
>
> func main() {
> println("Hello world")
> }
> EOF
$ GOOS=linux GOARCH=arm64 go build -o main_arm64 main.go
$ ./main_arm64: cannot execute binary file: Exec format error
$ apt-get install qemu-user
$ qemu-aarch64 ./main_arm64
Hello world
$ ./main_arm64
Hello world
The above snippet shows that after installing the qemu-user package, the main_arm64 binary becomes directly invocable too - thanks to the special kernel capability called binfmt_misc that allows registering custom user-space interpreters for different types of executables.
Thus, we can:
Of course, nothing should stop us from trying this trick with containers. A vanilla Docker Engine installation likely wouldn't allow you to run cross-platform containers, but there is a well-known tonistiigi/binfmt image that brings the cross-platform support to Docker Engine (or containerd), and it does something very similar to apt-get install qemu-user from above:
$ docker run --platform linux/arm64 nginx
exec /docker-entrypoint.sh: exec format error
$ docker run --privileged --rm tonistiigi/binfmt --install arm64
$ docker run --platform linux/arm64 nginx
...
2023/07/22 17:16:58 [notice] 1#1: using the "epoll" event method
2023/07/22 17:16:58 [notice] 1#1: nginx/1.25.1
2023/07/22 17:16:58 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14)
2023/07/22 17:16:58 [notice] 1#1: OS: Linux 5.10.175
Summarizing, there are two different problems - a) how to run cross-platform containers and b) how to launch a VM - and QEMU (well, different parts of it) just happens to be able to address both, but we should be clearly differentiating between a and b.
Why? Because thinking by analogy is a potent technique.
Apple's Virtualization Framework β Microsoft's Hyper-V β QEMU for VMs.
Rosetta β QEMU for user space emulation.
The devil is in the details, of course, but conceptually I find this approximation practical. And understanding the nature of tools helps to predict what should be possible and what's not. For instance, if Apple's Virtualization Framework is for running VMs, it should be possible to have a non-QEMU VM with qemu-user emulation. And at the time of writing this (Jul 2023), Docker Desktop for Mac indeed supports such a mode.
Here is my take on the most common ways Desktop Container Environments do cross-platform today:
Now, when we're done with the theory, let's take a look at OrbStack - a shiny new container runtime that claims to be a drop-in (and faster) replacement for Docker Desktop for Mac.
The OrbStack's feature that actually caught my eye wasn't its performance. It wasn't even the fact that containers started with OrbStack can be accessed by their IP addresses from the macOS host (which is pretty cool, by the way). It was the promised support of AMD64 VMs on Apple Silicon.
Hypothetically, it should indeed be possible for a Desktop Container Environment to run not one but two or more VMs - one per requested container architecture. For instance, AMD64 containers could go to an AMD64 VM, and ARM64 containers could go to an ARM64 VM. However, full-blown hardware emulation is usually slow, and Desktop Container Environments typically start just one VM - of the same architecture as the host system using the user-space emulation trick for the rest.
So, when I saw the following option in OrbStack UI, I was truly intrigued:
And I became even more intrigued when the requested VM booted in no time, and the performance from inside felt close to native. But there's no miracles π
Yes, the software inside thinks it's an AMD64 machine. Even uname says so. However, the actual CPU architecture is ARM64, and it's Rosetta user-space emulation all the way down - starting from systemd. I didn't believe it till the very end - only when I compiled two Go binaries - one for AMD64 and one for ARM64, and the latter ran without Rosetta in its process tree, I finally accepted the reality. A clever trick, but not something I was hoping for...
I've been on the lookout for a more "native" way to run AMD64 containers on Apple Silicon for quite a while. QEMU user-space emulation is great, but its success rate isn't 100% - not every image works fine under user-space emulation. For instance, qemu-user doesn't implement inotify, and it has been a problem for github.com/slimtoolkit/slim (aka DockerSlim), which, in particular, relies on inotify to track filesystem events. Trying Rosetta as an alternative sounded promising, but slim build nginx from inside of an OrbStack-powered VM didn't succeed either.
And that's when Lima saved the day again. Turns out, with Lima, you can start an AMD64 VM (via QEMU, of course - Lima can use the Virtualization Framework, but it supports only native VMs) on an Apple Silicon Mac by editing just one line in the template file. The trick also works on Linux - you can start an ARM64 VM on an AMD64 Linux host:
Of course, this setup will be much slower than the user-space emulation, but on my very basic M1 MacBook Air 2020, slim build nginx finished successfully in a Lima-powered AMD64 VM, which is a win, IMO. The bottom line, though - native execution is the only reliable and performant way to run containers, at least for now.
Well, that's pretty much it - hopefully, it was at least somewhat helpful :)
My work on iximiuz Labs continues, and I'm happy to share the key new features that were added since the last update a month ago:
As always, I'll include a complete report, including some juicy technical details, in the monthly round-up next week.
Traditional reminder: You can support the platform's development and get access to premium content, unlimited playground time, more powerful VMs, and insights into my creative process via Patreon and Discord updates. Every contribution matters!
β
Cheers
Ivan
Building labs.iximiuz.com - a place to help you learn Containers and Kubernetes the fun way π
Hello π It's this time of the month again! My traditional roundup of all things Linux, Containers, Kubernetes, and Server Side, delivered straight into your inbox π¬ What I was working on October was very productive for me - I shipped no major iximiuz Labs features (it's always hard to resist the temptation!) and instead dedicated all my available time to content work. The main focus was on Container Images. It's the subject of the first module of my "panoramic" Docker course, and it is almost...
Hey there, Iβve just finished putting together everything I know about Node.js container images and figured you might find the write-up useful. If youβre working with Node.js in Docker, chances are youβve been hit by the dilemma of which base image to use. Do you go for the default node:latest, the slimmer node:22-slim, or something super minimal like a distroless image? What about Bitnamiβs alternative β how does it stack up? Before you jump headfirst into your next build, you might want to...
Hello π Ivan's here with a slightly delayed September roundup of all things Linux, Containers, Kubernetes, and Server Side π§ What I was working on This month, I worked on an assorted set of topics. Skill Paths First off, the skill paths! I finally finished the underlying machinery, and now iximiuz Labs supports a new type of content - short roadmaps that you can use to develop or improve a specific skill: how to debug distroless containers, how to copy images from one repository to another,...