Supply Chain, Policy & Secrets

Only trusted code runs in your cluster, and secrets never leak — not in Git, not at rest, not in transit.

Only trusted code runs — and secrets never sit in Git as plaintext Developergit commit CI buildGitHub Actions Trivy scanCVE found = block cosign signkeyless OIDC + SBOM Registrysigned image Kyverno admission gate verifyImages → check Rekor unsigned / wrong registry = DENY Pod admittedruns in the cluster Secrets path (SOPS + age) admin laptop (age private key) → sops encrypt → ciphertext in Git → Flux pulls → decrypts in-cluster → Secret applied the private key never touches Git or the cluster's running pods — only Flux's controller can decrypt
Only signed, scanned images run; secrets are encrypted before they ever reach Git.

You’ve hardened the host, locked down the network, and isolated containers in sandboxes. But there’s a question this chapter answers: who decides what’s even allowed to run? A cluster that will execute any container image, allow privileged workloads, and store passwords in plain text is still catastrophically unsafe no matter how good the network rules are.

This layer has three jobs:

  1. Policy — a gatekeeper that checks every workload at the moment it’s created and blocks anything that violates your rules.
  2. Supply chain — a chain of custody proving that every image came from your CI pipeline, was scanned for known vulnerabilities, and wasn’t tampered with in transit.
  3. Secrets — keeping passwords, tokens, and keys out of Git and encrypted at rest, with a workflow that survives cluster rebuilds.

Part 1 — Admission Control: the gatekeeper

What “admission control” means

Admission control is a bouncer at the door of your cluster. Before any workload is allowed to start, the bouncer checks it against a list of rules. If it tries to run as root, use a host-level disk path, or pull from an untrusted registry — the door stays shut. The workload never runs; Kubernetes tells the caller exactly which rule it broke.

In plain English

Without a policy engine, Kubernetes will happily schedule whatever you send it. Pod Security Admission (the built-in feature) helps, but it’s coarse-grained. We need finer control: restrict specific registries, require specific sandbox runtimes, and — crucially — block images that haven’t been cryptographically signed by your CI pipeline.

Why Kyverno (not OPA Gatekeeper)

Two mature options exist. The research found Kyverno to be the clear winner for a single-admin homelab:

Kyverno OPA Gatekeeper
Policy language YAML — same as the rest of Kubernetes Rego — a specialised language with a steep learning curve
Image signature verification Built-in verifyImages rule No built-in; needs external tooling
Mutation (auto-fix) Yes Limited
Generate resources automatically Yes No
CNCF status Graduated (2023) OPA is Graduated
Memory (single-node) ~150 MB ~270 MB

Gatekeeper is the right choice only if you already have a large investment in Rego policies. For everyone else, Kyverno’s YAML policies are readable, auditable, and actionable.

Native ValidatingAdmissionPolicy (CEL) — the webhook-free option

Kubernetes 1.30 (GA) introduced ValidatingAdmissionPolicy, which uses a simple expression language (CEL) built directly into the API server — no webhook, no extra deployment. It’s ideal for simple checks like “all containers must have resource limits.” For anything more complex (image signature verification, mutation, resource generation), you still need Kyverno.

Note

You can see an example CEL policy in scripts/config/supplychain/kyverno-policies/ comments, but Kyverno is the primary engine here. Kyverno 1.12+ can even emit ValidatingAdmissionPolicy objects from its own policies, giving you both.


Installing Kyverno

The script scripts/cluster/23-policy-kyverno.sh handles this end-to-end. What it does:

  1. Adds the Kyverno Helm repo and installs the Kyverno engine into the kyverno namespace.
  2. Installs kyverno-policies at the restricted Pod Security Standard level — this is roughly 15 pre-built policies covering the most dangerous attack vectors.
  3. Applies the six custom cluster-policies defined in scripts/config/supplychain/kyverno-policies/.
Run as root on the cluster host
sudo bash scripts/cluster/23-policy-kyverno.sh

Kyverno runs as a validating webhook. Until it is healthy (all pods Running), the cluster will refuse to schedule new workloads. The script waits for the rollout to complete before proceeding.

Important

The six custom policies and what each one blocks

1. Disallow privileged containers — disallow-privileged.yaml

“Privileged mode” means the container has the same access to the host kernel as if it were running directly on the machine — no walls at all. This policy blocks that unconditionally.

In plain English

A container running with privileged: true can load kernel modules, access all devices, and escape every namespace boundary. It’s the single most dangerous container setting. This policy blocks it for all containers, init containers, and ephemeral containers.

2. Require resource limits — require-limits.yaml

Without resource limits, one runaway process can eat all memory on the node and crash everything else. This policy forces every container to declare a ceiling.

In plain English

Every container must declare both a CPU limit and a memory limit. Without this, a single compromised or buggy workload can trigger an out-of-memory condition that brings down the entire node — a cheap denial-of-service attack.

3. Require read-only root filesystem, non-root user, and dropped capabilities — require-ro-rootfs-nonroot-drop-caps.yaml

Three rules in one: (a) the container’s main disk area is read-only, so malware can’t write files to disguise itself; (b) processes run as an unprivileged user, not root; © all Linux “superpowers” (capabilities like NET_ADMIN, SYS_PTRACE) are stripped. A compromised container can do very little damage.

In plain English
  • readOnlyRootFilesystem: true — file system writes must go to explicitly mounted volumes.
  • runAsNonRoot: true — processes cannot run as UID 0.
  • allowPrivilegeEscalation: false — the process cannot gain new privileges (prevents setuid escalation).
  • capabilities.drop: [ALL] — all Linux capabilities are stripped; add back only specific ones if the workload genuinely needs them.

Also disallows hostPath volumes (direct access to the host filesystem) and hostNetwork: true (sharing the host’s network stack).

4. Restrict image registries — restrict-registries.yaml

Docker Hub has millions of images, most unmaintained. This policy says: we only allow images from our short approved list. Pull from anywhere else and admission is denied.

In plain English

Images must come from an allowlist. Edit the policy to match your setup — the default template allows registry.homelab.local (your private registry), ghcr.io (GitHub Container Registry), and cgr.dev (Chainguard hardened images). Anything else — including raw docker.io pull-throughs — is blocked at admission.

5. Require gVisor RuntimeClass in the untrusted namespace — require-runtimeclass-untrusted.yaml

Any pod scheduled in a namespace labelled untrusted must declare runtimeClassName: gvisor. This enforces the gVisor sandbox (see Layer 3) for tenant workloads you don’t fully control. If a workload doesn’t set it, admission is denied — it cannot accidentally run with the default runc runtime.

You must have gVisor installed (Layer 3 script) and the gvisor RuntimeClass created before this policy can match successfully. The script creates the RuntimeClass as part of the policy apply step.

Note

6. Block unsigned images — verify-images-cosign.yaml

This is the most powerful policy in this chapter. Every image from ghcr.io/your-org/* must have a valid cosign signature from your GitHub Actions workflow. If the signature is missing or invalid, the pod is rejected — even if the image tag and digest are correct.

Important

This policy ties the supply-chain (see Part 2) to admission control. The signature is checked against Sigstore’s Rekor transparency log; the subject must be your specific workflow file on your specific branch. Replacing the image tag or tampering with the image after signing invalidates the signature and blocks the pod.

The policy in verify-images-cosign.yaml uses keyless verification — no public key to manage or rotate. The trust anchor is GitHub’s OIDC issuer and the Rekor log.


Part 2 — Supply chain: proving the image is what you think it is

The threat this solves

Imagine someone replaces a package inside a Docker image after you built it, then pushes it to your registry under the same name. Without supply-chain controls, your cluster pulls and runs the tampered image. With them, Kyverno checks the cryptographic signature — and the tampered image has no valid signature, so it never runs.

In plain English

cosign keyless signing — no long-lived keys

Traditional image signing uses a private key you must store securely, rotate, and never lose. Cosign keyless eliminates the key entirely by using the CI job’s own identity:

  1. GitHub Actions proves it’s running your workflow (via GitHub’s OIDC token).
  2. Sigstore’s Fulcio CA issues a short-lived certificate binding that identity to a public key.
  3. cosign signs the image using that ephemeral key and records the signature in Rekor, a public, append-only transparency log.
  4. The certificate (and thus the key) expires in 10 minutes. No long-lived secret exists anywhere.

When Kyverno verifies the image later, it checks: “Is there an entry in Rekor signed by a certificate from Fulcio that says this digest came from github.com/myorg/myrepo/.github/workflows/build.yaml on main?” If yes, the image is trusted.

SBOM: the ingredient list for your image

A Software Bill of Materials (SBOM) is a machine-readable list of every library and dependency inside the image — think of it as a nutrition label for software. syft generates this list in the CycloneDX or SPDX format, and cosign attaches it to the image in the registry as an OCI attestation (same content-addressable layer system as the image itself).

Benefits:

  • If a new CVE drops (e.g., log4j), you can query your SBOM store to know instantly which images are affected, without rescanning everything.
  • Trivy (running in CI and in-cluster via Trivy Operator) reads the SBOM to produce the vulnerability report.
  • This combination reaches SLSA Level 2 — the image has provenance from a hosted build system.
SLSA Level What it means How we achieve it
L1 Provenance document exists cosign attest + syft SBOM in CI
L2 Provenance from a hosted build platform GitHub Actions + keyless cosign
L3 Hardened build platform (tamper-resistant) GitHub Actions hardened runners
L4 Hermetic, fully reproducible build Advanced; skip for homelab

The CI pipeline (GitHub Actions)

The file scripts/config/supplychain/cosign-sign-ci-example.yaml contains the full workflow. Key steps:

  1. Build and push the image to ghcr.io, capturing the exact digest (sha256:...).
  2. Install cosign via sigstore/cosign-installer.
  3. Sign the image digest (not just the tag) with cosign sign --yes. No key needed — GitHub provides the OIDC token.
  4. Generate SBOM with syft in CycloneDX-JSON format.
  5. Attest the SBOM with cosign attest, storing it as an OCI attestation alongside the image.

Always sign the digest, not the tag. A tag like v1.2.3 is mutable — someone can push a different image under the same tag. A digest (@sha256:abc123...) is immutable. The Kyverno verifyImages policy checks the digest.

Tip

Private registry: Harbor or registry:2

A private registry is your own internal copy of Docker Hub. Images you’ve built and vetted live here, and your cluster only pulls from here (enforced by the restrict-registries policy). Nothing comes from the public internet at runtime.

In plain English

Harbor (CNCF Graduated) is the full-featured choice: built-in Trivy scanning, SBOM storage, cosign integration, a web UI, and pull-through caching of upstream registries.

registry:2 (the official Docker Distribution binary) is the minimal choice: it’s a single container, uses almost no RAM, and can be configured as a pull-through cache.

Harbor registry:2
RAM footprint ~1 GB ~50 MB
Built-in scanning Yes (Trivy) No
SBOM / cosign UI Yes No
Pull-through cache Yes Yes
Use when ≥ 8 GB RAM RAM-constrained

Scanning and signing still happen in CI regardless of which registry you use — Harbor adds a visual layer on top.


Part 3 — Secrets: keeping sensitive values out of Git and encrypted at rest

Two problems, two solutions

Problem 1: secrets at rest in the cluster. Kubernetes stores its Secret objects in its database (etcd — the embedded etcd datastore configured in Layer 2). Without encryption, anyone who can read that file — or who steals a cluster backup — has all your secrets in plain text.

Solution: k3s --secrets-encryption. One flag at install time. k3s wraps every Secret object with secretbox (XSalsa20-Poly1305 encryption) before writing it to disk. The encryption key is stored in /var/lib/rancher/k3s/server/cred/encryption-config.json — protect this file like a root password.

Check your k3s config (should already be set by the k3s install script)
cat /etc/rancher/k3s/config.yaml | grep secrets-encryption
# should show:  secrets-encryption: true

# Verify the encryption config was written
cat /var/lib/rancher/k3s/server/cred/encryption-config.json
# shows the secretbox provider and key material

Rotate the encryption key quarterly:

On the cluster host (root)
k3s secrets-encrypt prepare      # generates new key
k3s secrets-encrypt rotate-keys  # re-encrypts all secrets with new key
k3s secrets-encrypt status        # confirms all nodes are in sync

Problem 2: secrets in Git (GitOps). If you store Kubernetes manifest files in a Git repository (which you should, for repeatability), you cannot put raw Secret values in those files. Base64 is not encryption — anyone who can read the repo can read the secrets.

Imagine you have a padlock (the age public key) that anyone can close, but only you have the key to open. You click the padlock shut on your secrets file before committing to Git. The locked file is safe to share publicly. Only the person with the physical key (your laptop’s age private key, and your CI) can open it.

In plain English

age is a modern, simple encryption tool. SOPS is a tool that encrypts YAML/JSON files in-place — it leaves the structure readable (you can see that a key is called password) but encrypts the value. This means git diff and code review still work.

Why not Sealed Secrets? Sealed Secrets is a good alternative (see scripts/config/supplychain/sops-age-example.md for a comparison), but its decryption key lives inside the cluster — if you lose the cluster before backing up that key, you can never recover your sealed secrets. The age private key lives on your laptop and in CI, entirely independent of the cluster.

The key flow

  1. Admin laptop: generate one age key pair, store the private key at ~/.config/sops/age/keys.txt. The public key goes in .sops.yaml in the GitOps repo root.
  2. CI (GitHub Actions): store the same age private key as a repository secret (AGE_SECRET_KEY). CI uses it to decrypt during dry-run validation.
  3. Flux in the cluster: the age private key is loaded once into a Kubernetes Secret in flux-system. Flux uses it to decrypt SOPS-encrypted manifests before applying them. The private key never leaves these three places — it is never checked into Git.

Full step-by-step is in scripts/config/supplychain/sops-age-example.md.

Hard rules for secrets

These rules apply to every workload in the cluster, no exceptions:

Hardened
  • Never store secret values in ConfigMap, pod environment variable annotations, or image layers.
  • Mount Secrets as files (volumes), not environment variables. Environment variables are visible in /proc/<pid>/environ and appear in crash dumps and kubectl describe pod output.
  • Set readOnly: true on all Secret volume mounts.
  • Never use envFrom pointing to a Secret in a pod spec from a public image — the Secret name is exposed in the spec and can be harvested.
  • The age private key lives in exactly three places: your laptop, CI secrets, and the flux-system/sops-age Kubernetes Secret. Nowhere else.

Part 4 — GitOps with Flux v2

What GitOps means

Instead of running kubectl apply by hand, you commit your desired cluster state to a Git repository. A controller (Flux) watches that repo and automatically applies changes. The Git history becomes the audit log of every change ever made to the cluster. Rolling back is a git revert.

In plain English

Why Flux over ArgoCD

Both are production-grade GitOps controllers. For a single-admin hardened homelab:

Flux v2 ArgoCD
Memory ~150–200 MB ~400–600 MB
SOPS native Yes — first-class .decryption field Via plugin
UI CLI-only (smaller attack surface) Rich web UI
Multi-cluster Possible, somewhat manual Excellent, native
ARM64/Pi Yes Yes

Flux wins on: footprint, SOPS-native decryption, and smaller attack surface (no UI to secure). Choose ArgoCD if you need the UI or manage multiple clusters.

Flux bootstrap

The script scripts/cluster/26-supplychain.sh handles the optional Flux bootstrap (set FLUX_ENABLED=1 to activate). The full manual steps and the egress NetworkPolicy that locks down the flux-system namespace are in scripts/config/supplychain/flux-bootstrap.md.

The flux-system NetworkPolicy restricts Flux’s egress to: your Git host (port 443), your container registry (port 443), the Kubernetes API server (port 6443), and DNS (port 53/UDP). Flux has no business talking to anything else.

Important

After bootstrap, verify that Flux can see and reconcile your repo:

On the admin laptop
flux get sources git
flux get kustomizations
flux logs --follow --level=error   # watch for reconciliation errors

Putting it all together: the chain of custody

Every workload in your cluster travels this chain:

  1. Code is committed and a CI build starts.
  2. Trivy scans the image for known vulnerabilities. A critical-severity finding blocks the build.
  3. cosign signs the image digest with a keyless signature tied to the specific workflow and branch. Rekor records it.
  4. syft generates a CycloneDX SBOM and cosign attests it alongside the image.
  5. The image is pushed to the registry (ghcr.io or Harbor). The GitOps manifests are updated with the new digest and SOPS-encrypted secrets are committed.
  6. Flux detects the change, decrypts any encrypted secrets using the age key, and sends the manifests to the Kubernetes API.
  7. Kyverno’s verifyImages policy checks the signature against Rekor. If valid: the pod is admitted. If missing or invalid: admission is denied, the pod never starts, and an audit event is logged.

What this layer bought you

Capability What it prevents
Kyverno restricted PSS Privileged containers, hostPath mounts, host networking, missing security contexts
Resource limits enforcement Node-level denial-of-service from runaway workloads
Registry allowlist Images pulled from untrusted or compromised public registries
gVisor RuntimeClass enforcement Unintended sandbox escapes in the untrusted namespace
cosign keyless signing + Kyverno verifyImages Tampered images, images built outside CI, supply-chain attacks
syft SBOM attestations Unknown dependencies; enables instant CVE blast-radius queries
k3s --secrets-encryption Secrets exposed via cluster database file or backup theft
SOPS + age Secrets committed to Git in plain text; secrets lost if cluster is rebuilt
Flux NetworkPolicy Compromised Flux controller exfiltrating data to arbitrary endpoints

A cluster that completes this layer will run only signed, scanned images from approved registries, enforce that all workloads run as unprivileged read-only processes with resource caps, keep secrets encrypted at rest and in Git, and reconcile state from a version-controlled repository with a full audit trail.