It’s 2023, is It Time for Unikernels on ARM Yet?

The short answer is: yes it is. But since that answer doesn’t constitute a “real” article, let me expand on that a bit.

Niels Bergsma
Better Programming

--

Amazing image by Filip Zrnzević. It may not be entirely related to the content below.

Remember a few years back, when unikernels took the spotlight for a while? What happened? Well, unfortunately, for most of them, progress has considerably slowed down, or come to a full stop. Reviewing http://unikernel.org/, we see that the last blog article dates back to 2017 — not quite encouraging. So did it completely die off?

No, not at all. There are still a few inceptions around that survived the evolution. We’ll be exploring NanoVM[2] in this article. But before jumping straight in, let’s back up for a minute and provide some background on why this is relevant in 2023.

The What and Why of Unikernels

Imagine your application, beautiful. Now imagine the cloud service it needs to run on. Strip away all layers of accidental complexity and sprinkle it with “just enough” OS/kernel to make it run. Voila, you have unikernels.

Or, to show it visually:

Typical setup vs Unikernel setup

Unikernels replace your host/guest OS and container engine and wraps your application with a tiny kernel “bow” as a VM image. As a result, the whole image can be as small as 1.5MB (compressed).

As well documented elsewhere, the benefits of this are threefold:

  • Security reduced attack surface. Less stuff to patch and maintain.
  • Performance: only the bare minimum is included to run your application. It’s designed specifically for that. No noisy background processes, SSH agents, … As a result, a virtual machine instance typically boots up in milliseconds.
  • Simplicity: a smaller stack reduces the complexity of your deployment pipelines and infrastructure. The primitives' cloud vendors provide are good enough for the rest.

[Side note] We tend to add layers and layers, and even more layers of abstraction to get feature X. Is that necessary? Try to explain a typical production stack to a stranger at a party — and see them fleeing for the door. “No, you need to have multiple layers of virtualization, otherwise it’s not flexible enough”. Word to the wise; don’t strike up casual conversations about production stacks, people aren’t into that sort of thing.

Meet NanoVMs

NanoVMs is a production-grade unikernel. The team behind it created the kernel from scratch and the necessary tooling to deploy applications. The great thing is that it’s Posix and Elf compatible, meaning it can run your application without any major reconstructions. Just compile it, build an image, and deploy it. Any depend library can be copied over (e.g. libc).

About ARM

No, it’s not explicitly tied to unikernels or NanoVMs for that matter. I thought it would be an interesting option to explore, that’s all. All mainstream cloud providers (Amazon, Microsoft, Google) made them available in the last 1–2 years. They supposedly provide improved price/performance (according to the documentation). Plus, I strongly suspect we may expect to see more ARM servers in the coming years.

For this article, I used Google’s Tau T2A series. An instance with 1 vCPU is equipped with 4GB of RAM. Spot instances are available[3], and cost around 8,50 Euro/month in my region. Not bad.

A Brief Study

Let’s roll up our sleeves and give this a go, shall we? To skip the boring part, I already wrote a simple API in Rust. You can find it here, together with a few instructions.

The plan

The plan

We’ll be building a tiny API, and deploying it as an instance group, with a load balancer in front of it on Google Cloud (GCP).

To try this at home, you need a GCP account (big surprise, I know). Create a VPC, if there isn’t one. Mine is called “default” because I’m good at naming things.

Let’s get started. Since I’m working on MacBook 2015, I need to cross-compile the source code to Linux. For this, I’ll be using Rust’s Cross-compilation tool. After cloning the repository, and getting set up, start the compilation like so:

RUSTFLAGS='-C target-feature=+crt-static' cross build --release --target aarch64-unknown-linux-gnu

For those who are now frowning; yes correct, this produces a statically linked library. It simplifies our life for the moment.

Also, we could compile against MUSL-libc, but as pointed out by NanoVMs team, expect slowdowns. Best to stick to regular (g)libc.

Next, create the image and upload it to GCP. Before we do, open up deployment/config.json and fill in the missing values.

ops image create target/aarch64-unknown-linux-gnu/release/api -c deployment/config.json -i api-v1 -t gcp
ops instance create api-v1 -c deployment/config.json -t gcp

The second command will result in an error. That’s okay. The configuration refers to an instance group that doesn’t exist yet. But it creates an instance template for us, which will be needed for the next step.

Now head over to Google Cloud Console, and create our instance group:

  • Name it “api-instance-group”
  • Select the template created in the previous step
  • Select a single zone that has T2A instance types, e.g. europe-west-4a
  • Include an HTTP health check for the path / at port 80.
Instance group

Great stuff, but our instance group isn’t reachable from the internet yet. For that, we need to add an HTTPS load balancer. Let’s create one:

  • Name it api-load-balancer
  • Name the frontend api-frontend. Set the protocol to HTTPS. Upload an existing SSL certificate if you have one lying around or create a new one. For the latter case, you may need to add a DNS record as well.
  • Add a backend service, and point it to our instance group. Disable CDN capabilities and add another health endpoint, with the same settings as before.
  • For routing rules, keep it at “Simple host and path rule”.
  • Grab a fresh cup of coffee while the load balancer is being created. It might take a few minutes.

When all check-marks light up green, curl the endpoint:

> curl https://your.endpoint.here/
< okay

Impressive, isn’t it? No OS the install, software to configure, troubleshooting for hours. It’s almost boring.

What if we want to update our incredible API? Openmain.rs, change the word “okay” to “ok” and rerun the commands:

RUSTFLAGS='-C target-feature=+crt-static' cross build --release --target aarch64-unknown-linux-gnu
ops image create target/aarch64-unknown-linux-gnu/release/api -c deployment/config.json -i api-v2 -t gcp
ops instance create api-v2 -c deployment/config.json -t gcp

Note that we switched to api-v2. If we now run the last command, it automatically updates the instance group. Execute curl, after the instances are updated, and behold: it returns ok. It just leaves you speechless, doesn’t it?

Bonus / Homework

The code includes an endpoint that ingests and returns protobuf (over HTTP). Try to write a protobuf client for that.

Performance

By now you might be asking yourself “but how does it actually perform?”. Rightfully so. And that is an excellent-looking question. My exploration was limited to micro-benchmarks, where network latency played a majority factor.

So I will not share those here — since your mileage may vary. Nonetheless, I did compare it against Linux VM instances, and x86 VM types, and the results are very promising.

Consider, at the end of the day these are micro-benchmarks. They don’t constitute long-running, production loads. For that, we need to do a bigger experiment. This is just testing out the waters, before jumping in. For an accurate comparison, run it side-by-side and instrument key properties (e.g. resource usage, response time, etc.).

Afterthoughts

My experience with NanoVM was really smooth. Things worked out of the box. I had my first “Hello World” service running within an hour. I didn’t encounter any show-stoppers and the NanoVM team was very helpful in answering all my queries.

I can safely conclude that:

  1. unikernels are still a thing.
  2. deserves more attention. They support a range of runtime models, from (micro-)services, serverless to edge-computing
  3. they have the ability to simplify our production stacks.

Before closing down this page and coming up with numerous reasons why this would never work for you. Let me ask you this — did you try it yourself? Go on, give it a spin.

In the next article, we’ll be exploring how to do the domain-driven design with Cloudflare’s workers (serverless + stateful) environment. Stick around.

Footnotes

[1] Honestly, it doesn’t have a whole lot to do with 2023, I just needed a catchy title
[2] There are others out there — but I didn’t explore them in depth since NanoVMs came up first in my Google search
[3] The example code uses regular provisioning. To use spot provisioning, set this flag in your Ops configuration.

--

--