Containers changed how we deploy software. But the default container runtime still shares the host kernel — and that single fact is the source of most serious container escapes documented since 2017. gVisor and Kata Containers attack this problem differently, and understanding both is essential for any team running multi-tenant workloads or handling sensitive data.
This article goes beyond the marketing pitch. We’ll look at how each technology actually works, what the performance trade-offs are, and where each one belongs in a real production environment.
Why Standard Containers Are Not Isolation Boundaries
A Docker container using the default runc runtime shares the Linux kernel with every other container and with the host. Namespaces and cgroups create the illusion of separation, but they are access control mechanisms, not true isolation. Every container process makes syscalls directly to the host kernel.
This matters because the Linux kernel has a massive attack surface. A compromised container that can exploit a kernel vulnerability — CVE-2022-0492 (cgroups v1 escape), CVE-2019-5736 (runc overwrite) — can potentially gain root on the host or break out to adjacent containers.
Seccomp profiles, AppArmor, and SELinux policies help reduce this surface. But they require careful maintenance, frequently break legitimate workloads, and do not fundamentally change the architecture: kernel code is still trusted to protect itself.
The core problem is not that containers are misconfigured. It’s that the shared-kernel model requires the kernel to be perfect. It never is.
The Two Approaches to Stronger Isolation
Two architectures address this at a structural level:
- User-space kernel interception (gVisor): Intercept syscalls from the container and handle them in user space with a minimal Go-based kernel implementation.
- Hardware virtualization (Kata Containers): Run each container inside a lightweight VM, so each container gets its own kernel, and VM hardware boundaries enforce isolation.
These are fundamentally different trade-offs, and choosing between them depends on workload characteristics rather than one being universally better.
gVisor: A User-Space Kernel in Go
gVisor (open-sourced by Google in 2018) implements a Linux-compatible syscall surface in Go, called Sentry. When a container process calls read(), connect(), or open(), the call goes to Sentry rather than to the host kernel. Sentry implements enough of the Linux API to run most containerized workloads while only passing a small set of approved operations down to the real kernel.
The architecture consists of two components:
- Sentry: The user-space kernel. Handles almost all syscalls. Written in Go with minimal use of unsafe packages. Runs as an unprivileged process.
- Gofer: A separate process that handles filesystem operations on behalf of Sentry. Runs with limited capabilities and communicates via the 9P protocol.
gVisor integrates with container runtimes through the OCI runtime spec. To use it with Docker or containerd:
# Install runsc (the gVisor runtime binary)
wget https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/runsc
chmod +x runsc && sudo mv runsc /usr/local/bin/
# Configure Docker daemon to use runsc
# /etc/docker/daemon.json
{
"runtimes": {
"runsc": {
"path": "/usr/local/bin/runsc"
}
}
}
# Run a container with gVisor
docker run --runtime=runsc -it ubuntu:22.04 bash
# Verify: inside the container, kernel version differs from host
uname -r
# 4.4.0 (gVisor's emulated kernel version, not your host's)
With Kubernetes, you use a RuntimeClass:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
---
apiVersion: v1
kind: Pod
metadata:
name: secure-workload
spec:
runtimeClassName: gvisor
containers:
- name: app
image: myapp:latest
gVisor Performance Characteristics
The syscall interception model has measurable overhead. Published benchmarks from Google and independent researchers show:
- Syscall-heavy workloads (many small file operations, frequent network calls): 2x–5x slower than native
- CPU-bound workloads (numerical computation, in-memory processing): near-native performance
- Memory allocation: slightly higher latency due to Sentry’s memory management
- Startup time: slightly higher than runc, but much lower than full VMs
For workloads like web servers handling HTTP requests with database calls, the overhead is typically 10–20% in real-world measurements, not the worst-case numbers seen in micro-benchmarks.
Kata Containers: VMs That Behave Like Containers
Kata Containers uses hardware virtualization. Each container (or pod in Kubernetes) runs inside a dedicated lightweight VM. The VM has its own kernel. An attack that escapes the container still has to escape the VM — and VM escape vulnerabilities are rare, well-audited, and not shared with other containers.
The architecture involves:
- kata-runtime: The OCI-compatible runtime that boots a VM for each container
- kata-agent: A process inside the VM that handles the container lifecycle
- Hypervisor: QEMU, Cloud Hypervisor, or Firecracker can be used as the backend
The Firecracker backend (also open-sourced by AWS) is the most production-hardened option. Firecracker is the same VMM that powers AWS Lambda and AWS Fargate. It boots a microVM in under 125ms with a minimal device model.
# Install Kata Containers on Ubuntu 22.04
bash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh) install-packages"
# Verify installation
kata-runtime check
# Configure containerd to use Kata with Firecracker
# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"
# Kubernetes RuntimeClass for Kata
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata-containers
handler: kata-fc
---
# Apply to a pod
spec:
runtimeClassName: kata-containers
Kata Performance Characteristics
Kata’s overhead profile is different from gVisor’s:
- Startup latency: 100–500ms depending on hypervisor and kernel size (gVisor is faster)
- Steady-state performance: Near-native for most workloads, because the guest kernel is a real Linux kernel
- Memory overhead: Each VM requires its own kernel memory (~64–128MB per container with a slimmed kernel)
- Filesystem operations: Slightly slower due to virtio-fs overhead, but much better than early Kata versions
For syscall-heavy workloads that perform poorly in gVisor, Kata often delivers much better performance because it uses a real kernel rather than a user-space emulation.
Direct Comparison: gVisor vs Kata Containers
| Factor | gVisor (runsc) | Kata Containers |
|---|---|---|
| Isolation mechanism | Syscall interception, user-space kernel | Hardware VM boundary |
| Kernel sharing | No (own kernel in Go) | No (own kernel per VM) |
| Syscall-heavy perf | Degraded (2x–5x) | Near-native |
| Startup latency | Low (comparable to runc) | Moderate (100–500ms) |
| Memory overhead | Low | Moderate (~64–128MB/VM) |
| Kernel CVE protection | Strong (Go kernel, small surface) | Strong (VM boundary) |
| Compatibility | ~90% of workloads | ~98% of workloads |
| Best for | Untrusted code, multi-tenant SaaS | High-perf isolation, regulated workloads |
Real-World Deployment Patterns
Multi-Tenant SaaS: Use gVisor
If you run customer code — CI/CD pipelines, serverless functions, user-uploaded scripts — gVisor is the right choice. Its low overhead makes it viable even for high-density deployments. Google Cloud Run uses gVisor for all container workloads. The startup latency advantage over Kata is significant when scaling to zero and back is frequent.
Regulated Industry Workloads: Use Kata
Healthcare, finance, and government workloads subject to compliance standards like HIPAA, PCI-DSS, or FedRAMP benefit from Kata’s VM-level isolation. Auditors understand VMs. The isolation model is clear and verifiable. When a workload’s performance requirements rule out gVisor’s syscall overhead, Kata gives you isolation without giving up throughput.
Defense in Depth: Layer Both with seccomp
Neither solution replaces good security hygiene. Even with gVisor or Kata, apply seccomp profiles, drop Linux capabilities, and run containers as non-root. These layers are cheap and stack multiplicatively against an attacker’s effort.
# A minimal seccomp profile for a web server, applied to a gVisor container
# This restricts even what Sentry can pass down to the host kernel
docker run --runtime=runsc \
--security-opt seccomp=/etc/docker/seccomp-profiles/web-server.json \
--user 1001:1001 \
--cap-drop ALL \
--cap-add NET_BIND_SERVICE \
myapp:latest
Limitations You Should Know Before Deploying
gVisor Compatibility Gaps
gVisor does not implement every Linux syscall. Common incompatibilities include:
- Workloads that use
/procheavily in unusual ways - Applications using
io_uring(support added but incomplete as of 2025) - FUSE mounts
- Some eBPF operations (which are themselves a kernel interface)
Testing with gVisor before production is non-negotiable. Run your full integration test suite inside runsc and observe failures before deploying.
Kata Overheads at Scale
At large scale, Kata’s per-VM memory overhead accumulates. A node running 200 containers uses an additional ~25GB of RAM just for VM kernels at 128MB each. Slim the kernel image (kata-containers provides prebuilt stripped kernels), and set explicit memory limits for each pod to keep this manageable.
Getting Started: A Practical Checklist
- Identify workloads that handle untrusted input, multi-tenant data, or sensitive regulated data
- Test those workloads under gVisor first — it covers ~90% of cases and has lower overhead
- For workloads that fail gVisor compatibility testing or require near-native syscall performance, evaluate Kata with Firecracker
- Define Kubernetes RuntimeClasses for each security tier: default (runc), isolated (gvisor), highly-isolated (kata)
- Add namespace-level admission policies (OPA Gatekeeper or Kyverno) to enforce RuntimeClass usage for sensitive namespaces
- Benchmark your specific workloads — generic benchmarks rarely reflect your actual syscall patterns
Conclusion
Standard containers are excellent for trusted, well-understood workloads. For anything that runs code from external sources, processes sensitive data, or faces compliance requirements, the shared-kernel model is an architectural risk that seccomp alone cannot adequately address.
gVisor and Kata Containers are production-ready solutions used by Google, AWS, and teams across regulated industries. They are not exotic or experimental. The tooling to deploy them via Kubernetes RuntimeClasses is mature and well-documented. The only thing stopping most teams is the time to test workload compatibility and adjust resource planning.
Start with your most sensitive workloads, run the compatibility tests, and benchmark against your real traffic patterns. The isolation improvements are significant; the trade-offs are manageable.
