Kubernetes — Running Multiple Container Runtimes

Published in

ITNEXT

7 min readAug 16, 2021

In this post, I want to show you how to run multiple OCI container runtimes on Kubernetes. You will see how to configure containerd to run both runC and Kata Containers. Then we will use the Kubernetes RuntimeClass API to let workloads choose the different container runtimes.

Why Different Container Runtimes

When multiple tenants shared a cluster, the heterogeneous nature of the workloads usually implies different execution and data trust boundaries. It’s not uncommon for such a cluster to own a set of trusted central services required for the management and operation of the cluster while also hosting “untrusted” workloads owned by the different tenants. While the common containerization approach which relies on Linux namespaces and cgroup might be suitable for running the trusted workloads, stronger workload isolation using hypervisor-based containerization technology may be better for mitigating the threat models associated with supporting the untrusted workloads.

Another example involves GPU workloads where hypervisor-based container runtimes can be used to enable GPU passthrough and GPU mediated passthrough. Single Root I/O Virtualization (SR-IOV) and high performance user-mode applications are also better served by non-traditional container runtimes.

Kubernetes provides the RuntimeClass API to allow workloads to select the container runtimes best suited for their requirements. This resource was first introduced in Kubernetes 1.12 as a Custom Resource Definition (CRD). It was later implemented as a built-in cluster resource in Kubernetes 1.14.

About Kata Containers

Kata Containers is an open source container runtime that runs container workloads on lightweight virtual machines. It utilizes hardware virtualization technology to enforce strong workload isolation. Workloads are run with dedicated minimal guest Linux kernel and guest image based on Clear Linux. This deployment model ensures that containerized processes no longer have access to the host kernel. It simplifies the security policies needed on the host kernel in order to guard against container exploitation.

Traditional containers vs. Kata containers

Kata Containers is OCI-compatible and works with containerd via a CRI-compliant shim. It utilizes Linux Traffic Control to redirect traffic between the container’s veth interface and the virtual machine’s TAP interface. For more information on Kata Containers’s architecture, see its documentation here.

And with that, let’s move on to setting up and configuring Kubernetes to work with runC and Kata Containers 🚢🚢🚢!

Provision Kubernetes Cluster

In my setup, I provisioned a Kubernetes v1.22.0 cluster using kubeadm. The version of containerd used in my cluster is 1.4.9.

The remainder of this section will only highlight relevant installation and configuration steps. Detailed information on using kubeadm to provision Kubernetes can be found in the Kubernetes documentation, along with important information on installing containerd.

👷 The containerd.io package can be installed without needing the docker-ce and docker-ce-cli packages.

My cluster is made up of 3 DigitalOcean droplets with 4GB of memory and 2 CPUs, running Ubuntu 20.04:

k8s-control-plane hosts the Kubernetes control plane
k8s-worker uses runC to serve trusted workloads
k8s-worker-untrusted uses both runC and Kata Containers to serve workloads, with untrusted workloads designated to Kata Containers

I used Calico as the CNI plugin to support pod networking.

Prior to using the kubeadm init command to initialize the control plane, let’s modify the containerd’s configuration file at /etc/containerd/config.toml on each node.

🔧 The kata-deploy tool is an easy way to install Kata Containers on Kubernetes. For the purpose of demonstration, I will be manually configuring containerd and installing Kata Containers in this post.

Configure containerd With Kata Containers

📝 All subsequent code examples require direct SSH access to the cluster nodes, and permissions to modify the containerd’s configuration file on the nodes.

Use the containerd config default command to re-generate the containerd’s default configuration on all the nodes:

Re-generate containerd’s default configuration

On the k8s-worker-untrusted node where Kata Containers will be installed, patch the containerd’s configuration file as follows:

Patch containerd’s configuration file to include the kata shimv2

This patch extends the containerd’s cri plugin with the kata handler. The name of this handler will be referenced in the RuntimeClass resource specification later, as explained in the Kubernetes documentation.

Theruntime_type property is used by containerd to identify the shim needed to interact with the underlying OCI runtime. containerd translates the runtime_type into the shim’s binary name by prepending the handle name and version with thecontainerd-shim prefix. For example, io.containerd.kata.v2 is translated to containerd-shim-kata-v2, io.containerd.runc.v1 becomes containerd-shim-runc-v1 etc.

The containerd-shim-kata-v2 implements the Containerd Runtime V2 API. Through this shim, Kubernetes will be able to instruct kata to launch Pod and OCI-compatible containers.

Setting the privileged_without_host_devices property to true configures containerd to not give privileged kata containers direct access to the host devices.

📝 This patch did not disable runC on purpose, to show that a node is capable of hosting multiple container runtimes.

After the patch is applied successfully, restart containerd using systemctl:

Restart containerd using systemctl

Install Kata Containers On The Untrusted Node

Install Kata Containers 2.1.1 on the k8s-worker-untrusted node using snap:

Install Kata Containers on the untrusted node

Use the kata-containers.runtime CLI to ensure that the k8s-worker-untrusted node can run Kata Containers:

Ensure that the untrusted node can run Kata Containers

Initialize The Kubernetes Control Plane

Initialize the Kubernetes control plane on the k8s-control-plane node with the kubeadm init command:

Initialize the Kubernetes control plane

On the k8s-worker and k8s-worker-untrusted nodes, use the kubeadm join command to join the workers to the control plane:

Join the workers to the control plane

Confirm that all the nodes are healthy:

Confirm that all nodes are healthy

All subsequent kubectl commands use the default kubeconfig generated by kubeadm, which can be found in the /etc/kubernetes folder of the k8s-control-plane node.

Schedule The Untrusted Workload

To ensure that all untrusted workloads will be scheduled on the k8s-worker-untrusted node, we will taint and label the node with the arbitrary example.org/workload=untrusted label:

Taint and label the k8s-worker-untrusted node

Create the kata RuntimeClass resource:

Create the kata RuntimeClass resource

Create the “untrusted” Deployment resource where the pod is comprised of a curl container and an nginx container:

Deploy the untrusted workload

Confirm that the nginx-untrusted deployment is rolled out successfully:

Confirm that the untrusted workload is rolled out successfully

Examine The QEMU Process

Let’s examine the QEMU process of the pod on the k8s-worker-untrusted node:

Check the QEMU process of the pod

There should be only one QEMU process, even though the pod is running 2 containers. Information on the loaded devices and path to the vmlinuz kernel can be seen in the process arguments.

In my setup, the vmlinuz-5.10.25.container guest kernel was about 5.2MB in size. In comparison, the vmlinuz-5.4.0-80-generic host kernel on the same droplet was about 12MB. This small kernel makes it relatively fast to spin up new pods.

Access The Guest VM Console

The kata-containers.runtime CLI has an exec command which provides a mechanism to enter into the guest VM via a debug console.

⚠️ The default Clear Linux image may not return atty with full shell access. See this GitHub issue.

To use this feature, enable the kata agent’s debug_console_enabled property in the /etc/kata-runtime/configuration.toml configuration file:

Enable the kata agent’s debug_console_enabled property

Start the kata-monitor process on the k8s-worker-untrusted node:

Start the kata-monitor process

Then executes the kata-containers.runtime exec command with the sandbox ID as the argument:

Use the ‘kata-containers.rutime exec’ command to access the guest VM console

🤔 How many virtual machines do you think we will end up with if we scale the nginx-untrusted workload to 3 replicas? Will we end up with 3 virtual machines? Or will the 6 containers end up sharing the same virtual machine? How long do you think the scaling operation will take?

Between The Trusted And Untrusted Workloads

As a final test, we will deploy a similar Deployment resource with the same pod specification. This workload will play the role of our “trusted” workload:

Deploy the trusted workload

This is what my default namespace looks like after scaling the untrusted workload and deploying the trusted workload:

The ‘default’ namespace with both trusted and untrusted workload

Notice that all the trusted pods are scheduled to run on the k8s-worker node while the untrusted ones are on the k8s-worker-untrusted node.

At this point, there is nothing to enforce the network boundaries between the trusted and untrusted workloads. Pods can freely talk to each other.

For example, I can use the trusted curl to reach the untrusted nginx:

Trusted curl can reach the untrusted nginx

I can also use the untrusted curl to reach the trusted nginx:

Untrusted curl can reach the trusted nginx

The task to deploy NetworkPolicy resources to restrict the traffic flow between the trusted and untrusted domains will be left as an exercise for the readers.

Conclusion

In this post, we provisioned a Kubernetes v1.22.0 cluster on DigitalOcean using kubeadm. We designated one node to serve trusted workload, while another one to serve untrusted workload. Prior to initializing the cluster, we manually patch the containerd’s configuration file and install Kata Containers on the untrusted node. In a real setup, the kata-deploy tool will be a better choice for deploying Kata Containers on Kubernetes.

Then we deployed trusted and untrusted workloads onto the cluster. By using proper taints and labels, all the untrusted workloads were scheduled to run on the untrusted nodes. Although both the trusted and untrusted workloads were served by different container runtimes, they were able to communicate with each other. The task of deploying NetworkPolicy resources to enforce network boundaries between the trusted and untrusted domains are left as an exercise for the readers.

ITNEXT

Kubernetes — Running Multiple Container Runtimes

Why Different Container Runtimes

About Kata Containers

Provision Kubernetes Cluster

Configure containerd With Kata Containers

Install Kata Containers On The Untrusted Node

Initialize The Kubernetes Control Plane

Schedule The Untrusted Workload

Examine The QEMU Process

Access The Guest VM Console

Between The Trusted And Untrusted Workloads

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in ITNEXT

Written by Ivan Sim

Responses (1)