Image sizes miss the point

Some engineers in the containers community have advocated for small images. These "small image" proponents support using base images such as distroless and alpine and image optimization tools such as Docker Slim. But small images can still be complex, and the complexity is the true enemy, not simply size. By analyzing an actual example, this blog post demonstrates this point.  Additionally, this post, which builds on the All About That Base Image white paper and webinar, provides broader insights about how images accrue technical debt and how this debt can be minimized.

The composition process of images

First, let’s have a look at how images are built in theory.  In general, developers use Docker to compose images using a Dockerfile, which either install packages or run build commands to build an application.  These packages and build artifacts can be considered as components in the final image which gets deployed.  Let’s take a look at a basic example of a Dockerfile, which uses Alpine as an example.

FROM alpine:latest
RUN apk add nginx

This Dockerfile takes the latest Alpine image and installs the nginx package inside it, but what is the actual technical debt of this image?  We can use syft to find out:

The output of syft from the image we built with Docker

According to syft, there are 16 packages in the image.  Each of these packages is a source of technical debt, despite the image being only 9 MB!

An analysis of eleven of the most popular base images (identified via GitHub code search in a previous whitepaper) suggests a similar finding. While the size of an image and the number of components do have a moderately strong positive correlation, there are instances of images that are approximately the same size (in MB) and yet have a drastically different number of components.

The number of components verses the size of an image, many small images have hundreds of components

To reduce debt, reduce image complexity not size

As illustrated above, images are built out of components which are layered on top of each other.  While increased image complexity typically does involve increased image size, the important metric is the number of underlying components.  The goal is always to reduce the number of components that are put into an image during the authoring process, which means that image size reduction tools often miss the point – scanners generally still have to deal with the same package set – while introducing their own technical and legal risks (a popular image optimization tool deletes license text from the images it optimizes making those images not legally redistributable!).

As an example, we will compare the Alpine nginx image we built using Docker with one built using apko, which has only 9 components but delivers the same functionality as the other nginx image:

The output of syft for the same image, but built with apko

In closing, using tools such as apko or base images such as distroless help to reduce the complexity of runtime images by reducing the number of components an image is ultimately composed from.  This translates to reduced technical debt for developers and security teams.

Show Comments