When Docker kicked off the great container wave, one of the big advantages, compared to virtual machines, was the speed and size of containers. Software developers quickly started taking this further. How small could container images get? What could be stripped out and what's needed to remain for the container to be useful? This post will walk through the typical approaches in this space — minimal distributions, scratch and “distroless” — finishing with a look at Chainguard’s new, improved version of distroless.
The first point worth making is that size isn’t everything. In container images, size is often used as a proxy for a more useful measure: the number of components in an image, or its complexity. The less complex the image — and the fewer binaries and packages in it — the lower the risk of something going wrong or being targeted by hackers
One powerful way to reduce complexity is to cut down Linux distributions to their bare bones. The result is a base image that software teams can easily build on, installing additional packages with a package manager.
The two options I usually recommend for slimming down a base image are:
- Debian Slim. You can find “slim” variants of the official Debian images on the Docker Hub such as. debian:11-slim, which clocks in around 74 MB, compared to 118 MB for the full image. The savings are mainly in removing documentation and locale files.
- Alpine. Alpine uses the busybox tool suite and musl amongst other techniques to cut things down much further, to around 5 MB!
All other things being equal, Alpine’s size savings make it the most appealing choice. The main exceptions are when required packages are available in Debian but not Alpine, or where glibc has to be used rather than musl due to application dependencies.
But we can still go further. Although these images are optimized for a small size, they still contain a full Linux distribution. Is a full distribution really what we want? When our image is running in production, we don’t need to compile binaries, install new packages or add new users — so why are we loading our images with software to do this?
A Distroless Philosophy
It’s possible to strip everything out and use a completely empty image (aka a “scratch” image) to host our application. This is dependent on being able to create a “static” binary that contains all its dependencies including system libraries. Doing this typically results in an image only a few MB in size, depending on the size of the application. (Looking at some toy examples, a Rust CLI tool results in a 15.9 MB image, a Go hello world binary gets down to 2.07 MB but a similar C program reduces the image size to an astonishing 452 bytes).
In most cases though, you will find your application requires a few more things, such as:
- Root certificate data so that TLS connections can be trusted
- Core libraries like glibc or musl where static compilation isn’t possible or desirable
- Runtimes for certain languages such as JRE, Python, Node
- Files and directories commonly used by libraries including /etc/passwd and /tmp
These requirements effectively gave rise to the distroless container philosophy. Distroless containers hold the minimum needed to get going; so they have certificates and base requirements for various stacks, with almost none of the typical Linux distribution features like a package manager or shell. Matt Moore, CTO of Chainguard, described some of these images — especially those that supported runtimes like Java or Python — as “scratch for the rest of us”.
The first distroless images were produced by GoogleContainerTools, which has distroless images designed for running Go, Java, and Python images. The Google images are produced using a bespoke process orchestrated by their Bazel build tool.
A common criticism is that people want shells and other tooling to debug problems. A better approach to this issue is to use ephemeral containers. The ephemeral container is only run when debugging and can hold all these tools without adding to the size and complexity of the production container. Unfortunately, ephemeral containers are still in beta, and not available on all Kubernetes platforms just yet.
The most common request for Google images is to extend a given image with a package or library from the ecosystem. Doing this yourself is difficult. You can try using a multistage Dockerfile build and copying things over, but this is likely to cause problems with missing dependencies, and requires writing and maintaining your Dockerfile, and running docker build to produce your image. For Google’s Bazel-based distroless images, adding just one more package may involve learning Bazel, and operating a complex Bazel build process to produce your image.
Distroless: The Next Generation
Chainguard is building the next generation of the distroless images, using a toolchain that makes it easy to compose distroless images from existing packages and to create custom packages. Unlike Google’s Bazel and Debian based-system, our toolchain is centered around apk (the package manager used by Alpine), apko and melange. Apko (named after the ko build tool that inspired it) takes apk packages described in a config file and then solves the dependency set for installing the packages into the final image — adding a new package only requires editing a single line of a YAML file. Melange is used to create new apk packages for use in apko, allowing easy creation of custom content in a declarative and reproducible way — if you rebuild the package again, the binary will be the same.
Today, these tools can be used to build distroless images using apk-based distributions such as the Alpine, Adélie or Chimera Linux distributions, but we are also building our own companion GNU/Linux distribution to allow for our tooling to be used in situations where glibc is required.
This is the future of distroless and the path to minimal complexity.
Try out our new images at github.com/distroless.