Featured in Blog >

Kata Containers + Service Mesh

Kata Containers + Service Mesh

Why service mesh?

Kata Containers 1.0 was released during the OpenStack Summit in Vancouver in May 2018. The next steps for our team were to evaluate how Kata integrates with other components of the Kubernetes ecosystem. I chose to test two popular service mesh products for Kubernetes, Istio and Conduit.

Both service mesh products are used for monitoring, controlling and securing the traffic between microservices.

How do they work?

Istio and Conduit use the same model: they run controller applications in the control plane, and inject a proxy as a sidecar inside the service pod. The proxy registers to the control plane as a first step, and it constantly sends different types of information about the service running inside the pod. The information comes from the filtering the proxy performs by receiving all the traffic initially intended for the service. The interaction between the control plane and the proxy allows you to apply load balancing and authentication rules to the ongoing traffic inside the cluster between multiple microservices.

The proxy magic could not happen without a good amount of iptable rules ensuring the packets reach the proxy instead of the expected service. Those rules are set up through an init container because they must be there as soon as the proxy starts.

Service mesh with Kata Containers

Now that we understand the overhead introduced by the service meshes, let's try to make them work with Kata Containers.

First of all, kata-runtime will not be used if one of the containers is tagged as privileged, because privileged means running with root access to the devices and network on the host.

When using the Istio istioctl kube-inject ... to inject the sidecar and the init container into the YAML file defining your service, the init container gets defined as privileged. This is needed for specific use cases like running in an SELinux enabled environment or if you want to use gdb in order to debug, but the most common case only needs NET_ADMIN capability to be able to use iptables. Conduit simply defines NET_ADMIN flag without using the privileged one, which makes it even simpler.

Once you have injected your YAML file with the appropriate command, and replaced privileged: true with privileged: false, then add the untrusted annotation to your deployment. This combination ensures that kata-runtime will be chosen by your Container Runtime Interface (CRI) implementation to run the described pod and containers, assuming it was properly configured on your system.

It’s important to know that Kata Containers runs a virtual machine (VM) for every pod, which means the guest kernel has to provide the support for netfilter and iptables. Kata Containers allows you to use your own custom kernel; however, you must make sure that it meets the service mesh requirements for the proxy and the init container.

That's it, that's how simple it is to get Kata working with your favorite service mesh for Kubernetes.

Fun fact

Although this looks pretty straightforward, I ran into a weird issue while trying to enable the service meshes with Kata. When I started the service pod, the proxy was registering properly with the control plane but was not forwarding the traffic to the service it was intended for. A unique connection was detected on the first HTTP request, but after this, nothing...

It turned out the proxy was expecting to get the socket option through the getsockopt() system call, but this call, which used the SO_ORIGINAL_DST flag, was generating a deadlock on the socket lock. The proxy was not able to redirect the traffic to the service because the proxy was never able to talk to it.

Thanks to this issue opened by hawkw on the Conduit GitHub, it was obvious I was using the wrong kernel version! After a quick try with kernel 4.14.49, the issue was resolved and Kata began working as expected.

Troubleshooting and debug

When I ran into the issue mentioned above, I used several tools and techniques that you may find useful also. Since your Kata pod will run inside a VM, make sure you get both the traffic from the host, listening to the CNI bridge that was created by your CNI plugin, and from the guest, listening on any interface inside the pod. With this setup you will not miss anything and you will make sure the traffic properly reaches the network inside the VM.

For listening, tcpdump will be your best friend. To use it, tcpdump must be available through one of your container rootfs. You can use the proxy container that contains it, or create an extra container pulling the image where you will have access to all the tools you need. Whatever container you choose, you must enable certain key capabilities such as NET_ADMIN and SYS_ADMIN to avoid getting errors such as root permission needed.

On the host, the tcpdump command looks like:
tcpdump -i cni0 -w host_traffic.pcap
tcpdump -i cni0 -w host_traffic.pcap
From the guest, run the following command:
tcpdump -i any -w guest_traffic.pcap
tcpdump -i any -w guest_traffic.pcap

The file guest_traffic.pcap is on the guest and you can retrieve it with kubectl cp. This way, you will have both files on the host. A tool like wireshark will be very helpful to make your reading easier.

These powerful tools help you see all the traffic. They can be combined with netstat -plunt, iptables-save, or arp -n to understand what is happening inside your cluster and particularly inside your VM.

Try it yourself

I hope you learned something reading this post, and that you are ready to play with service mesh secured by Kata Containers.

Please reach out on our mailing list lists.katacontainers.io if you have some interesting use cases to share with the community or if you need any help or support.