Docker, WSL, and oneAPI — A Quick How-To Guide

Optimizing a containerized oneAPI workload for deployment

TonyM
Better Programming

--

Photo by Christian Wiediger on Unsplash

What is a container?

In my last blog, I spent some time talking about why being efficient as a developer matters to me. The proliferation of container technology can significantly help us achieve that goal.

First though, let’s quickly go through what a container is for those who don’t know. Docker is the most common containerization solution, so I’ll take the definition of a container from their website:

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

Containers can run on Windows or Linux systems, are relatively easy to set up and deploy, and are easy to share or distribute to others (via services like Docker Hub).

Of course, there are some limitations to using containers as well. You have to set up a container runtime on every system where you want to run your containers, containers can get quite large, which may be an issue if you are having to upload and download them often, and they don’t provide hard separation of resources on the underlying system like a virtual machine (VM) would. Despite these limitations, for many use cases containers are an awesome way to go, so let’s talk about how it helps us as developers.

What does this mean to a developer?

Using containers as part of our development process means you can create and maintain a stable, reproducible environment from which to develop software. If you ever corrupt your development environment, it is as easy as starting another instance of the container to get a clean environment to work in.

Another cool thing is you can create and run multiple developments and test environments (aka multiple Linux versions) on a single system. Most importantly, you generally don’t have to worry about messing up your host system since the container isolates most resources.

For developers whose customers use containers, shipping your application in a container allows you to run in a more controlled environment. This can help reduce bugs, reduce your testing surface area, and improve your time to market (TTM), which helps both you and your customers.

There are other options available to solve some of the “multiple development environment” problems such as virtual machine (VM)-based solutions. You can create a base Windows Subsystem for Linux (WSL) instance, configure your development environment in it and deploy the instance multiple times. You also could use enterprise solutions like a VMware Virtual Desktop Infrasture (VDI) which are even more robust. However, these solutions aren’t available to everyone on all platforms, so for the purposes of this blog, I will focus on leveraging containers to help develop and deploy software using oneAPI.

Using the oneAPI Development Containers

To leverage the oneAPI development containers to provide the aforementioned, stable development environment, we first need to set up Docker on our development system. In general, you can just go to Docker’s website and follow their installation instructions. You may remember from my previous blog that I’m using the WSL environment for my development, so it was slightly trickier.

I was going to do a simple write-up on this, but fortunately many people before me have done it. I found ferariasstory to be the simplest to understand if you want to follow this approach.

Once Docker is set up, you can grab the latest and greatest oneAPI development containers from Docker Hub.

There are oneAPI development containers for developing oneAPI code on CentOS8, Ubuntu 18.04, or Ubuntu 20.04. For this example, I’m going to build and package some simple SYCL code, so I’m going to pull the oneAPI Base Toolkit, Ubuntu 20.04 image. Using the Docker command line, I just run:

> docker pull intel/oneapi-basekit:devel-ubuntu20.04

This downloads the container to my local system. To get into the development environment you can run the container using the following command:

> docker run -ti --name=ubuntu-dev-20.04 intel/oneapi-basekit:devel-ubuntu20.04

The -ti flag tells Docker to provide me with an interactive terminal in the container once it is up and running. The name flag gives our running container the name ubuntu-dev-20.04.

Building some code

To make things a bit simpler, I will be leveraging the oneAPI samples. I went to the oneAPI samples Github:

…and cloned the repo in my development container. I’ll be using the Nbody sample located in the DirectProgramming->DPC++->N-BodyMethods->Nbody folder.

The nice thing about using the development container is I do NOT have to run

> source /opt/intel/oneapi/setvars.sh

As I did in my last example because the oneAPI development container was built having already run this command. I simply can go to the Nbody folder and follow the instructions to build and run the Nbody example:

> mkdir build
> cd build
> cmake ..
> make
> make run

Here’s the output on my Intel i9 Alder Lake Alienware R13 system:

Creating a Production Container

Now that we know our code builds in the development container, let’s see how we can ship it in a container to a customer to run on their system.

Docker will build a new container using the Dockerfile format. For those of you who are not familiar with how to use a Dockerfile, check out Docker’s Getting Started guide.

I’m going to leverage the Docker multi-stage build capability to tell the docker build process to build my code in one container and then copy it to another container. This allows me to have the entire build and package workflow in a single place, while still allowing me to ship a container without all the development tools. Here’s my production container Dockerfile:

You’ll see that for the production container I’m using the intel/oneapi-runtime:latest container as a base because it provides all the runtimes required to run any oneAPI-based workload.

The next step is to take this Dockerfile, which I named Dockerfile.runtime, and run the docker build command to create my new container.

> docker build . -f Dockerfile.runtime -t tonymintel/nbody:runtime

This command tells Docker to build using the current path, my Dockerfile.runtime file and tagging the image as tonym/nbody:runtime.

A successful docker build! Yay!

A quick run of the container shows that our build was completed successfully and is running out of code as we want.

Now I just need to package up my container and ship it to my customer. I can push it up to Docker Hub or use the docker save command to save the image on my local system and send the file.

In my case, I’m just pushing it up to Docker Hub:

> docker push tonymintel/nbody:runtime

This pushes the image to my tonymintel Docker Hub account and saves it to the nbody repository with the tag runtime. Now anyone with access to the tonymintel/nbody Docker Hub repository can pull and run the code via regular Docker commands.

Tagged nbody repo on Docker Hub

Making my production container 4x smaller

Just as being efficient as a developer is important to me, having an efficient solution is important to users of our applications. As my colleague James Reinders says, we need to make sure we bring them “Joy Out of the Box.”

From my screenshot, you can see the container I’ve uploaded is 1.32GB. That’s a pretty big Docker container for my customer to download.

My production container is based on the intel/oneapi-runtime container, which provides all the runtimes for all toolkits that build oneAPI code. This isn’t necessarily a bad thing. oneAPI provides a variety of libraries that I may want to integrate with my code, so having a runtime package that I know will run my code is great. However, since I’m the developer and I know what libraries I used, let’s see if we can optimize our production container to make it smaller.

The intel/oneapi-runtime:latest Docker Hub file as of June 2022

After some inspection, I created a new Dockerfile.prod that looks like this:

There’s a lot of text here, so let me try to explain what is going on:

  1. Lines 1–9 and 32–34 are the same in both Dockerfiles. They build our binary and run it when the container starts
  2. Lines 11–30 are mostly boilerplate code that sets up Intel’s apt repositories and then installs some Intel packages via APT. These lines are pulled from the oneapi-runtime Dockerfile, which can be found here:

The key line in my Dockerfile.prod is line 28. If you go look at the original intel/oneapi-runtime Dockerfile (click here and click on a digest SHA and find the biggest image layer), it does an apt-get for all the possible runtime packages. In my case, I know I only used DPC++, so my Dockerfile.prod only includes that intel-oneapi-runtime-dpcpp-cpp package.

Now I just build my new Dockerfile, verify it works as before, and push it to the cloud.

> docker build . -f Dockerfile.prod -t tonymintel/nbody:production
> docker run -it tonymintel/nbody:production
> docker push tonymintel/nbody:production

Now I see ~4x improvement in download size for my customer…a little more Joy Out of the Box hopefully.

You might ask, can I do better even? To avoid suspense, the answer is yes, but it involved going in and pulling individual files to omit various unused runtime components. It’s a little bit cumbersome, but it is possible. If you’re curious, it ended up making the package about 250MB for a ~5x improvement.

I consider this a big win because it can really affect your customer’s experience. Obviously, if your customer needs to download a container over a standard internet connection this is a big win. In a large-scale environment like a data center or cloud environment, this may be even more valuable since deploying the same pod on tens or hundreds of nodes is very common and network usage, saturation, and latency is always a concern. Perhaps most importantly, if your customer is paying for network or storage to deploy their containers, they’ll thank you (or at least won’t complain) about the container costing them more. :)

Conclusion

Containers are a useful technology to help us as developers build and test our code. Depending on our customer (aka will they use a container), it also can really help us ship more reliable code to our customer.

There’s definitely some work that goes into learning all the technology, but it is definitely worth it in many cases. I hope this helped you learn a little about how to leverage containers and how to optimize those containers for your customer.

Until next time!

Want to Connect?If you want to see what random tech news I’m reading, you can follow me on Twitter.Tony is a Software Architect and Technical Evangelist at Intel. He has worked on several software developer tools and most recently led the software engineering team that built the data center platform which enabled Habana’s scalable MLPerf solution.Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.

--

--

Father and husband, Pool Player, Car and Gadget Lover, Software Architect, Technical Product Manager and Technical Evangelist @ Intel