Kubernetes — Running Multiple Container Runtimes
In this post, I want to show you how to run multiple OCI container runtimes on Kubernetes. You will see how to configure containerd to run both runC and Kata Containers. Then we will use the Kubernetes RuntimeClass
API to let workloads choose the different container runtimes.
Why Different Container Runtimes
When multiple tenants shared a cluster, the heterogeneous nature of the workloads usually implies different execution and data trust boundaries. It’s not uncommon for such a cluster to own a set of trusted central services required for the management and operation of the cluster while also hosting “untrusted” workloads owned by the different tenants. While the common containerization approach which relies on Linux namespaces and cgroup might be suitable for running the trusted workloads, stronger workload isolation using hypervisor-based containerization technology may be better for mitigating the threat models associated with supporting the untrusted workloads.
Another example involves GPU workloads where hypervisor-based container runtimes can be used to enable GPU passthrough and GPU mediated passthrough. Single Root I/O Virtualization (SR-IOV) and high performance user-mode applications are also better served by non-traditional container runtimes.
Kubernetes provides the RuntimeClass
API to allow workloads to select the container runtimes best suited for their requirements. This resource was first introduced in Kubernetes 1.12 as a Custom Resource Definition (CRD). It was later implemented as a built-in cluster resource in Kubernetes 1.14.
About Kata Containers
Kata Containers is an open source container runtime that runs container workloads on lightweight virtual machines. It utilizes hardware virtualization technology to enforce strong workload isolation. Workloads are run with dedicated minimal guest Linux kernel and guest image based on Clear Linux. This deployment model ensures that containerized processes no longer have access to the host kernel. It simplifies the security policies needed on the host kernel in order to guard against container exploitation.

Kata Containers is OCI-compatible and works with containerd via a CRI-compliant shim. It utilizes Linux Traffic Control to redirect traffic between the container’s veth
interface and the virtual machine’s TAP
interface. For more information on Kata Containers’s architecture, see its documentation here.
And with that, let’s move on to setting up and configuring Kubernetes to work with runC and Kata Containers 🚢🚢🚢!
Provision Kubernetes Cluster
In my setup, I provisioned a Kubernetes v1.22.0 cluster using kubeadm
. The version of containerd used in my cluster is 1.4.9.
The remainder of this section will only highlight relevant installation and configuration steps. Detailed information on using kubeadm
to provision Kubernetes can be found in the Kubernetes documentation, along with important information on installing containerd.
👷 The containerd.io
package can be installed without needing the docker-ce
and docker-ce-cli
packages.
My cluster is made up of 3 DigitalOcean droplets with 4GB of memory and 2 CPUs, running Ubuntu 20.04:
k8s-control-plane
hosts the Kubernetes control planek8s-worker
uses runC to serve trusted workloadsk8s-worker-untrusted
uses both runC and Kata Containers to serve workloads, with untrusted workloads designated to Kata Containers

I used Calico as the CNI plugin to support pod networking.
Prior to using the kubeadm init
command to initialize the control plane, let’s modify the containerd’s configuration file at /etc/containerd/config.toml
on each node.
🔧 The kata-deploy
tool is an easy way to install Kata Containers on Kubernetes. For the purpose of demonstration, I will be manually configuring containerd and installing Kata Containers in this post.
Configure containerd With Kata Containers
📝 All subsequent code examples require direct SSH access to the cluster nodes, and permissions to modify the containerd’s configuration file on the nodes.
Use the containerd config default
command to re-generate the containerd’s default configuration on all the nodes:
On the k8s-worker-untrusted
node where Kata Containers will be installed, patch the containerd’s configuration file as follows:
This patch extends the containerd’s cri
plugin with the kata
handler. The name of this handler will be referenced in the RuntimeClass
resource specification later, as explained in the Kubernetes documentation.
Theruntime_type
property is used by containerd to identify the shim needed to interact with the underlying OCI runtime. containerd translates the runtime_type
into the shim’s binary name by prepending the handle name and version with thecontainerd-shim
prefix. For example, io.containerd.kata.v2
is translated to containerd-shim-kata-v2
, io.containerd.runc.v1
becomes containerd-shim-runc-v1
etc.
The containerd-shim-kata-v2
implements the Containerd Runtime V2 API. Through this shim, Kubernetes will be able to instruct kata to launch Pod and OCI-compatible containers.
Setting the privileged_without_host_devices
property to true
configures containerd to not give privileged kata containers direct access to the host devices.
📝 This patch did not disable runC on purpose, to show that a node is capable of hosting multiple container runtimes.
After the patch is applied successfully, restart containerd using systemctl
:
Install Kata Containers On The Untrusted Node
Install Kata Containers 2.1.1 on the k8s-worker-untrusted
node using snap
:
Use the kata-containers.runtime
CLI to ensure that the k8s-worker-untrusted
node can run Kata Containers:
Initialize The Kubernetes Control Plane
Initialize the Kubernetes control plane on the k8s-control-plane
node with the kubeadm init
command:
On the k8s-worker
and k8s-worker-untrusted
nodes, use the kubeadm join
command to join the workers to the control plane:
Confirm that all the nodes are healthy:
All subsequent kubectl
commands use the default kubeconfig generated by kubeadm
, which can be found in the /etc/kubernetes
folder of the k8s-control-plane
node.
Schedule The Untrusted Workload
To ensure that all untrusted workloads will be scheduled on the k8s-worker-untrusted
node, we will taint and label the node with the arbitrary example.org/workload=untrusted
label:
Create the kata RuntimeClass
resource:
Create the “untrusted” Deployment
resource where the pod is comprised of a curl
container and an nginx
container:
Confirm that the nginx-untrusted
deployment is rolled out successfully:
Examine The QEMU Process
Let’s examine the QEMU process of the pod on the k8s-worker-untrusted
node:
There should be only one QEMU process, even though the pod is running 2 containers. Information on the loaded devices and path to the vmlinuz
kernel can be seen in the process arguments.
In my setup, the vmlinuz-5.10.25.container
guest kernel was about 5.2MB in size. In comparison, the vmlinuz-5.4.0-80-generic
host kernel on the same droplet was about 12MB. This small kernel makes it relatively fast to spin up new pods.
Access The Guest VM Console
The kata-containers.runtime
CLI has an exec
command which provides a mechanism to enter into the guest VM via a debug console.
⚠️ The default Clear Linux image may not return atty
with full shell access. See this GitHub issue.
To use this feature, enable the kata agent’s debug_console_enabled
property in the /etc/kata-runtime/configuration.toml
configuration file:
Start the kata-monitor
process on the k8s-worker-untrusted
node:
Then executes the kata-containers.runtime exec
command with the sandbox ID as the argument:
🤔 How many virtual machines do you think we will end up with if we scale the nginx-untrusted
workload to 3 replicas? Will we end up with 3 virtual machines? Or will the 6 containers end up sharing the same virtual machine? How long do you think the scaling operation will take?
Between The Trusted And Untrusted Workloads
As a final test, we will deploy a similar Deployment
resource with the same pod specification. This workload will play the role of our “trusted” workload:
This is what my default
namespace looks like after scaling the untrusted workload and deploying the trusted workload:
Notice that all the trusted pods are scheduled to run on the k8s-worker
node while the untrusted ones are on the k8s-worker-untrusted
node.
At this point, there is nothing to enforce the network boundaries between the trusted and untrusted workloads. Pods can freely talk to each other.
For example, I can use the trusted curl
to reach the untrusted nginx
:
I can also use the untrusted curl
to reach the trusted nginx
:
The task to deploy NetworkPolicy
resources to restrict the traffic flow between the trusted and untrusted domains will be left as an exercise for the readers.
Conclusion
In this post, we provisioned a Kubernetes v1.22.0 cluster on DigitalOcean using kubeadm
. We designated one node to serve trusted workload, while another one to serve untrusted workload. Prior to initializing the cluster, we manually patch the containerd’s configuration file and install Kata Containers on the untrusted node. In a real setup, the kata-deploy
tool will be a better choice for deploying Kata Containers on Kubernetes.
Then we deployed trusted and untrusted workloads onto the cluster. By using proper taints and labels, all the untrusted workloads were scheduled to run on the untrusted nodes. Although both the trusted and untrusted workloads were served by different container runtimes, they were able to communicate with each other. The task of deploying NetworkPolicy
resources to enforce network boundaries between the trusted and untrusted domains are left as an exercise for the readers.