Layer 2: Hardened K3s

Install Kubernetes so locked-down that it passes the CIS security benchmark out of the box — every secret encrypted, every door closed.

ATTACKER PROTECTED CORE L0 Hardened Host OS Ubuntu/Debian · Secure & measured boot · kernel lockdown · LUKS2 · nftables · auditd CIS L1/L2 L1 Immutable VM (KVM) Talos / Flatcar · read-only root · dm-verity · no shell · API-only · sVirt isolation VM SANDBOX L2 CIS-Hardened k3s Audit logging · PSA restricted · RBAC least-priv · secrets encryption · etcd kube-bench L3 Sealed Runtime gVisor / Kata sandbox · seccomp · drop ALL caps · userns · read-only rootfs · distroless PER-POD L4 Zero-Trust Network Cilium eBPF · default-deny · WireGuard encryption · Hubble · FQDN egress · L7 policy DEFAULT DENY L5 Encrypted Storage LUKS2 USB · Garage S3 object · Longhorn/TopoLVM persistent · encrypted ephemeral AT REST L6 Always-On Observability Prometheus · Loki · Tetragon/Falco eBPF detection & enforcement · Trivy · Kubescape WATCHING Your Workload — encrypted, sandboxed, watched
Layer 2 — the hardened Kubernetes control plane at the center of the stack.

By the end of this chapter you will have a running Kubernetes cluster where: secrets are encrypted in the database before they ever touch disk, every container is forced into a security sandbox, an audit trail records every sensitive action, and the cluster’s control port is invisible to the public internet. You will also know how to prove all of this with an automated scanner.

What is k3s, and why use it here?

Kubernetes is the software that decides which containers run on which machines, restarts them when they crash, and handles networking between them. A full Kubernetes installation has a dozen separate programs. k3s is Rancher’s single-binary packaging of all of them — the same technology, a fraction of the footprint. On a homelab machine it starts in seconds and uses a few hundred megabytes of RAM instead of several gigabytes.

More importantly for this guide: k3s ships with several hardening defaults that vanilla Kubernetes does not. We will build on those defaults rather than work around them.

Kubernetes is the manager of your containers — it knows where everything is running and makes sure it keeps running. K3s is a smaller, pre-configured version of that manager, designed for machines that aren’t enormous cloud servers. Think of it as the same job title, but a leaner employee who already knows the security rules.

In plain English

What k3s gives you for free

Before adding anything, k3s already does:

  • RBAC enabled by default. Every action inside the cluster requires an explicit permission. Nothing works by default except what k3s itself needs to function.
  • Node restriction — a Kubernetes node (the machine running containers) can only see and modify its own resources, not those of other nodes.
  • Kubelet client-certificate authentication. The kubelet (the per-node agent) proves its identity with a certificate before the API server talks to it.
  • Bound service account tokens. When a pod needs to talk to Kubernetes itself, it gets a short-lived, scoped token — not an old-style permanent secret.

RBAC is a permission system: every user and every program inside the cluster has to show its badge before doing anything. k3s ships with RBAC switched on. That is not the default in plain Kubernetes.

In plain English

What needs manual work

The gaps that remain — each closed in this chapter:

Gap Why it matters
Secrets encryption at rest Without it, anyone who copies the database gets every password in plain text
Audit logging Without it you have no record of who did what
Pod security enforcement Without it any container can run as root and escape to the host
Read-only port closed CVE-2025-46599 — left open, exposes every pod’s environment variables with zero auth
TLS cipher restriction Prevents downgrade attacks to weak encryption
Unused components disabled Smaller attack surface
Default service account tokens revoked Prevents containers from talking to Kubernetes when they have no reason to

The one rule you cannot break: secrets encryption at first boot

secrets-encryption: true must be set before k3s runs for the first time. You cannot enable it on an existing cluster without a full re-encryption procedure that risks data loss. The install script enforces this. Do not run k3s manually before running the script.

Caution

When secrets encryption is on, every Kubernetes Secret object is encrypted with secretbox (XSalsa20-Poly1305) before being written to the etcd database — this is the k3s default provider since v1.32.4, and is what this guide’s v1.36 build uses. An attacker who steals a copy of the database gets ciphertext, not your actual secrets. The encryption key is stored separately on the server and can be rotated without restarting the cluster.

Think of the database as a filing cabinet. Without encryption, the filing cabinet stores documents in plain text — anyone who picks the lock reads everything. With secrets encryption, the filing cabinet stores scrambled text, and the decoder ring lives somewhere else. Stealing the cabinet alone gets you nothing.

In plain English

Layer 0 dependency: kernel parameters

protect-kernel-defaults: true in the k3s config tells the kubelet: “before you do anything, check that the Linux kernel is configured the way you expect. If it isn’t, refuse to start.” This is a safety net — it catches machines where someone changed a kernel setting that Kubernetes relies on.

The required kernel settings are written by Layer 0 (/etc/sysctl.d/90-kubelet.sysctl.conf). The install script checks they are applied before touching k3s. If you run this script on a fresh machine without running Layer 0 first, it will stop and tell you.

It is like a car that checks its own oil level before the engine will start. The kernel parameters are the oil. Layer 0 put the oil in. If the oil is missing, k3s refuses to move.

In plain English

The datastore: embedded etcd

k3s ships two datastore options: SQLite (simple, single-node only) and embedded etcd (production-grade, HA-ready). We use embedded etcd (cluster-init: true), because:

  1. The CIS hardening guide assumes etcd — some benchmark checks only apply to etcd.
  2. etcd encrypts its snapshots with AES-256 derived from the cluster join token.
  3. If you ever add a second node for high availability, etcd supports it; SQLite does not.

SQLite is a basic notebook. etcd is a bank vault with a backup system and a combination lock. Both store the same data; one is ready for a real security posture.

In plain English

etcd snapshots contain all cluster data including CA private keys. Snapshot encryption is derived from the cluster join token. Store the token and the snapshots in separate places. Anyone with both can decrypt everything.

Important

The config file — every setting explained

K3s reads a single YAML file at startup: /etc/rancher/k3s/config.yaml. The hardened version lives at scripts/config/k3s/config.yaml. Every flag is explained below.

/etc/rancher/k3s/config.yaml (excerpt — see full file)
cluster-init: true          # use embedded etcd, not SQLite
secrets-encryption: true    # encrypt secrets in etcd — MUST be set at first boot
protect-kernel-defaults: true  # refuse to start if kernel params are wrong
write-kubeconfig-mode: "0600"  # kubeconfig readable only by root

tls-san:
  - "10.0.0.1"        # your WireGuard IP — edit this
  - "k3s.internal"    # optional hostname

disable:
  - traefik            # replace with a hardened ingress controller if needed
  - servicelb          # replace with MetalLB or kube-vip
  - local-storage      # replace with Longhorn or a CSI driver
  - metrics-server     # deploy your own if needed

# Layer 4 (Cilium) prep: disable the built-in network stack
# so Cilium can replace it completely.
flannel-backend: none
disable-network-policy: true
disable-kube-proxy: true

The three lines at the bottom (flannel-backend: none, disable-network-policy: true, disable-kube-proxy: true) leave the cluster with no network until Layer 4 installs Cilium. Do not try to run workloads between Layer 2 and Layer 4.

Note

Flannel is k3s’s built-in “how containers talk to each other” system. We turn it off because Layer 4 installs a much more powerful replacement (Cilium). Until that replacement arrives, containers cannot talk to each other — which is fine because we haven’t put any workloads in yet.

In plain English

API server hardening flags

These go under kube-apiserver-arg: — they tune the gatekeeper that every kubectl command talks to.

Flag What it does
anonymous-auth=false Rejects any request with no identity (already the k3s default; explicit for auditability)
enable-admission-plugins=... Turns on four extra security checks at pod admission time (see PSA section)
admission-control-config-file=... Points to the PSA + EventRateLimit config
audit-log-path=... and three related flags Turns on audit logging with 30-day retention
tls-cipher-suites=... Restricts TLS to forward-secret, authenticated encryption only (ECDHE + AES-GCM / ChaCha20)
tls-min-version=VersionTLS12 Rejects TLS 1.0 and 1.1 connections
service-account-extend-token-expiration=false Disables the legacy long-lived token extension
request-timeout=300s Caps how long a single API request can run

Kubelet hardening flags

These go under kubelet-arg: — they tune the per-node agent.

Flag What it does
read-only-port=0 Closes port 10255 (CVE-2025-46599 — see below)
streaming-connection-idle-timeout=5m Times out idle exec/attach sessions
event-qps=5 / event-burst=10 Rate-limits events from kubelet
tls-cipher-suites=... Same cipher restriction as the API server
seccomp-default=true Applies a system-call filter to every container by default (Layer 3 depends on this)
rotate-server-certificates=true Auto-renews kubelet TLS certs before they expire

CVE-2025-46599 — the read-only port regression

CVSS 7.5 High. k3s 1.32 before 1.32.4 accidentally re-enabled kubelet port 10255. Port 10255 requires zero authentication and exposes pod metadata, environment variables (which often contain secrets), and node information.

Important

The bug: a Go programming quirk (omitempty) silently stripped the explicit 0 value from the kubelet config, reverting to the compiled default — which is 10255. The fix in k3s 1.32.4+ corrects this. We are on 1.36.1 (unaffected), but we also set read-only-port=0 explicitly as belt-and-suspenders — if a future regression repeats the bug, the explicit flag wins.

Verify port is closed (run on the server)
# Should return "connection refused" — any other response means the port is open
curl -sk http://127.0.0.1:10255/pods && echo "PORT IS OPEN — INVESTIGATE" || echo "port 10255 closed OK"

Imagine a side entrance to your house that was supposed to be locked but kept unlocking itself due to a software glitch. The fix is to deadbolt it and also add a padlock — two independent mechanisms so a future glitch can’t undo both at once.

In plain English

Audit logging

The audit log is a chronological record of every sensitive action taken inside the cluster: who ran kubectl exec, what pods were created, who changed a permission. It is the cluster’s CCTV footage.

Kubernetes audit logging requires four flags and a policy file. If any piece is missing, logging silently fails. The install script writes everything before k3s starts.

The policy (scripts/config/k3s/audit-policy.yaml) uses a layered approach:

  • Suppressed entirely: health checks, leader-election churn — pure noise with zero security value.
  • Metadata only: secrets, configmaps, service accounts — logs who accessed them but never logs the actual value (so your secrets do not appear in the audit log).
  • Full request + response: exec/attach/port-forward, RBAC changes, token reviews — high-risk operations where you want every detail.
  • Request body only: pod and workload mutations, network policy changes.
  • Everything else: metadata level — nothing goes unlogged.

Audit logs land at /var/lib/rancher/k3s/server/logs/audit.log, rotated at 100 MB, kept for 30 days, up to 10 files.

For alerting, forward the audit log to a SIEM with Promtail → Loki or Filebeat → Elasticsearch. The critical alert rules: any pods/exec from a non-operator identity, any clusterrolebindings mutation, any secrets delete verb.

Tip

Pod Security Admission — every container in a restricted sandbox

Pod Security Admission (PSA) is the Kubernetes mechanism that inspects every container definition before it runs and rejects anything that violates your security policy. It replaced the older PodSecurityPolicy in Kubernetes 1.25.

We enforce the restricted standard cluster-wide. What restricted requires of every container:

  • Run as a non-root user
  • Drop all Linux capabilities
  • No privilege escalation allowed
  • A seccomp filter must be applied (which seccomp-default=true on the kubelet provides)
  • No access to the host’s network, PID namespace, or IPC namespace

Think of restricted as a strict rule: every container has to work as a regular user, cannot use any special system tools, and is completely isolated from the machine it runs on. If a container definition breaks any of these rules, Kubernetes refuses to start it.

In plain English

Three namespaces are exempted: kube-system, kube-public, and kube-node-lease. These run k3s’s own infrastructure (CoreDNS, the CNI agent, etcd) which legitimately needs privileged access. Every namespace you create gets restricted enforcement automatically.

The config lives at scripts/config/k3s/psa.yaml. It is passed to the API server via admission-control-config-file.

scripts/config/k3s/psa.yaml (key lines)
defaults:
  enforce: "restricted"
  enforce-version: "latest"
  audit: "restricted"     # violations in exempted namespaces still appear in the audit log
  warn: "restricted"      # kubectl prints a warning even for non-enforced violations
exemptions:
  namespaces:
    - kube-system
    - kube-public
    - kube-node-lease

When you add a monitoring namespace that runs node-exporter (which needs hostPID: true), add that namespace to the exemptions list and tightly scope its RBAC. Do the same for any other infrastructure workload that legitimately needs elevated privileges.

Important

TLS — what encryption actually means here

Every connection to and from the k3s API server uses TLS — the same encryption protocol as HTTPS websites. TLS has had several versions; the older ones (1.0, 1.1) have known weaknesses. We set tls-min-version=VersionTLS12 to reject them.

Even within TLS 1.2, some cipher suites (the specific algorithm combination) are weak. We explicitly list only suites that use:

  • ECDHE (Elliptic Curve Diffie-Hellman Ephemeral) — means even if someone records the traffic now and cracks the key later, they still cannot decrypt old sessions
  • AES-256-GCM or AES-128-GCM or ChaCha20-Poly1305 — authenticated encryption that detects tampering

“Forward secrecy” means: even if an attacker records all your encrypted traffic today and somehow steals the server’s private key in 5 years, they still cannot decrypt the old traffic. Each session used a different one-time key. ECDHE is what makes that possible.

In plain English

Default service account tokens — revoked

Every Kubernetes namespace has a default service account. Every pod that does not explicitly choose a service account uses this one. By default, k3s mounts its API token into every pod — meaning any container that gets compromised can immediately start talking to the Kubernetes API.

We patch automountServiceAccountToken: false on the default service account in every namespace. Workloads that genuinely need API access create a dedicated service account with the minimum permissions required.

Patch all namespaces (run after cluster creation)
for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
  kubectl patch serviceaccount default -n "$ns" \
    -p '{"automountServiceAccountToken": false}'
done

The install script runs this loop automatically. Re-run it whenever you add a namespace.


Getting kubectl access over WireGuard

Port 6443 (the Kubernetes API server) is never exposed on a public interface. All kubectl access goes through the WireGuard tunnel established in Layer 4.

On the k3s server (to retrieve the kubeconfig):

On the k3s server (root)
sudo cat /etc/rancher/k3s/k3s.yaml

On your kubectl client machine:

On your client machine
mkdir -p ~/.kube
# Paste the above output into ~/.kube/config
# Then edit the server address:
sed -i 's|https://127.0.0.1:6443|https://10.0.0.1:6443|' ~/.kube/config
# Replace 10.0.0.1 with your actual WireGuard IP
chmod 600 ~/.kube/config

# Test (requires WireGuard to be up):
kubectl get nodes

The WireGuard IP must match a value in tls-san in config.yaml, otherwise TLS will reject the connection with a certificate error. If you get a TLS error, re-check tls-san, then regenerate certs: k3s certificate rotate.

Important

The kubeconfig file is like an access card. It tells kubectl where the server is and includes a certificate proving the server is who it says it is. Without a WireGuard connection to the private IP, the access card points to an address that doesn’t exist from the public internet — which is exactly what we want.

In plain English

Verifying with kube-bench

kube-bench is an open-source tool from Aqua Security that runs the full CIS Kubernetes Benchmark check list against your cluster and tells you which controls pass, which fail, and exactly what command to run to fix each failure.

kube-bench auto-detects the correct CIS profile for the k3s version it finds on the node — you do not need to pin a profile. If you ever need to override (e.g. to test against a specific baseline), use --benchmark <profile>; otherwise omit the flag and let kube-bench detect.

Option A: binary on the host (most thorough)

Running directly on the host — not inside a container — lets kube-bench read every file with root access.

On the k3s server (root)
# Download kube-bench (check https://github.com/aquasecurity/kube-bench/releases for latest)
curl -LO https://github.com/aquasecurity/kube-bench/releases/download/v0.9.0/kube-bench_0.9.0_linux_amd64.tar.gz
tar xzf kube-bench_0.9.0_linux_amd64.tar.gz

# Run (as root) — profile auto-detected from the running k3s version
./kube-bench 2>&1 | tee "kube-bench-$(date +%Y%m%d).txt"

# Review summary at the end:
tail -20 "kube-bench-$(date +%Y%m%d).txt"

Option B: in-cluster Job (convenient)

From your kubectl client (over WireGuard)
kubectl apply -f scripts/config/k3s/kube-bench-job.yaml
kubectl wait --for=condition=complete job/kube-bench-k3s -n kube-system --timeout=120s
kubectl logs -n kube-system job/kube-bench-k3s
kubectl delete -f scripts/config/k3s/kube-bench-job.yaml

Reading the results

Result Meaning Action
PASS Control is compliant None
WARN Manual or informational control Review; document your decision
FAIL Non-compliant; remediation required Follow the fix command in the output
INFO Context only None

After applying this chapter’s configuration, the expected results are:

  • Most API server, etcd, and kubelet checks: PASS
  • Audit logging: PASS (all four flags set)
  • PSA / pod security: PASS
  • Network policies: WARN (Layer 4 installs Cilium with network policies)
  • A few manual/informational WARNs that require operator documentation

CIS Benchmark v1.12 coverage

CIS Control Description Status Where
1.1.1 API server file permissions AUTO k3s manages its own binary
1.2.1 anonymous-auth=false AUTO k3s default
1.2.2 No basic-auth-file AUTO Not supported in k3s
1.2.4 Authorization mode not AlwaysAllow AUTO Node,RBAC default
1.2.5 NodeRestriction admission AUTO k3s default
1.2.6 Audit logging enabled MANUAL audit-policy.yaml + 4 apiserver args
1.2.7 Audit policy file set MANUAL audit-policy.yaml
1.2.8 EventRateLimit MANUAL psa.yaml + apiserver arg
1.2.9 AlwaysPullImages MANUAL enable-admission-plugins
1.2.10 PodSecurity admission MANUAL psa.yaml
1.2.12 TLS cipher suites MANUAL tls-cipher-suites in config.yaml
1.2.13 TLS minimum version MANUAL tls-min-version in config.yaml
1.2.15 Service account key file AUTO k3s manages
1.2.16 etcd CA/cert/key AUTO Embedded etcd
1.2.20 No long-lived SA token extension MANUAL service-account-extend-token-expiration=false
1.2.21 API request timeout MANUAL request-timeout=300s
1.3.1 Terminated pod GC threshold MANUAL kube-controller-manager-arg
1.3.2 Controller-manager profiling N/A Not exposed in k3s
1.4.1 Scheduler profiling N/A Not exposed in k3s
2.1 etcd TLS AUTO Embedded etcd always TLS
2.2 etcd peer TLS AUTO Embedded etcd
2.4 etcd data dir permissions MANUAL chmod 700 on /var/lib/rancher/k3s/server/db/etcd
2.7 Encryption at rest MANUAL secrets-encryption: true
3.1.1 Kubelet client cert auth AUTO k3s default
3.2.1 Audit logs present MANUAL audit-policy.yaml + args
4.1.1 Kubelet service file permissions AUTO systemd default
4.2.1 Kubelet anonymous-auth=false AUTO k3s default
4.2.2 Kubelet authorization webhook AUTO k3s default
4.2.3 Kubelet client CA AUTO k3s default
4.2.4 read-only-port=0 MANUAL kubelet-arg (CVE-2025-46599)
4.2.5 Streaming connection idle timeout MANUAL kubelet-arg
4.2.6 protect-kernel-defaults MANUAL Layer 0 sysctls + config.yaml
4.2.7 make-iptables-util-chains MANUAL kubelet-arg
4.2.8 event-qps MANUAL kubelet-arg
4.2.9 kubelet TLS cipher suites MANUAL kubelet-arg
4.2.11 Rotate server certificates MANUAL kubelet-arg
4.2.13 seccomp-default MANUAL kubelet-arg
5.1.1 cluster-admin binding audit MANUAL RBAC audit commands (§7.3 research)
5.1.2 Minimize service account access MANUAL RBAC least-privilege
5.1.5 Default SA automount disabled MANUAL post-install patch loop
5.2.x Pod Security Standards MANUAL psa.yaml
5.3.1–5.3.2 Network policies MANUAL Layer 4 (Cilium)

Ongoing maintenance

Key rotation

Rotate secrets encryption keys (run on the server)
# Rotate the encryption key for all Secrets in etcd.
# The old key is retained until you explicitly remove it, so no downtime.
k3s secrets-encrypt rotate-keys

# Check status:
k3s secrets-encrypt status

Do this quarterly, or after any suspected compromise.

Certificate expiry

k3s auto-rotates certificates when they are within 90 days of expiry (changed to 120 days in v1.33+). To rotate manually:

Rotate k3s TLS certificates
systemctl stop k3s
k3s certificate rotate
systemctl start k3s

Regular audit queries

Weekly security checks
# Who has cluster-admin?
kubectl get clusterrolebindings \
  -o jsonpath='{range .items[?(@.roleRef.name=="cluster-admin")]}{.subjects}{"\n"}{end}'

# What can the default service account do? (should return nothing)
kubectl auth can-i --list --as=system:serviceaccount:default:default

# Any failed events?
kubectl get events --field-selector reason=Failed -A

What this layer bought you

Secrets encrypted in the database. An attacker who steals a snapshot of etcd — a common post-exploit move — gets secretbox (XSalsa20-Poly1305) ciphertext, not your secrets.

A complete audit trail. Every exec into a container, every RBAC change, every token review is logged with timestamp and identity. You can answer “who ran that command?” weeks later.

Every container sandboxed. The restricted Pod Security Standard blocks the most common container-escape techniques before any code runs. No container can run as root or escalate privileges without an explicit exemption.

Port 10255 dead. CVE-2025-46599 is closed by both the k3s version and the explicit flag.

No unnecessary network components. Traefik, the built-in load balancer, and local storage are gone — three potential attack surfaces removed. The cluster is ready for Cilium (Layer 4), which will provide network policies instead.

Smaller credential blast radius. No container gets a Kubernetes API token unless its service account explicitly opts in. A compromised workload cannot pivot to the Kubernetes control plane.

Kube-bench green. You have a repeatable, automated way to prove compliance to yourself (and anyone else) on demand.