Supply Chain, Policy & Secrets
Only trusted code runs in your cluster, and secrets never leak — not in Git, not at rest, not in transit.
You’ve hardened the host, locked down the network, and isolated containers in sandboxes. But there’s a question this chapter answers: who decides what’s even allowed to run? A cluster that will execute any container image, allow privileged workloads, and store passwords in plain text is still catastrophically unsafe no matter how good the network rules are.
This layer has three jobs:
- Policy — a gatekeeper that checks every workload at the moment it’s created and blocks anything that violates your rules.
- Supply chain — a chain of custody proving that every image came from your CI pipeline, was scanned for known vulnerabilities, and wasn’t tampered with in transit.
- Secrets — keeping passwords, tokens, and keys out of Git and encrypted at rest, with a workflow that survives cluster rebuilds.
Part 1 — Admission Control: the gatekeeper
What “admission control” means
Admission control is a bouncer at the door of your cluster. Before any workload is allowed to start, the bouncer checks it against a list of rules. If it tries to run as root, use a host-level disk path, or pull from an untrusted registry — the door stays shut. The workload never runs; Kubernetes tells the caller exactly which rule it broke.
Without a policy engine, Kubernetes will happily schedule whatever you send it. Pod Security Admission (the built-in feature) helps, but it’s coarse-grained. We need finer control: restrict specific registries, require specific sandbox runtimes, and — crucially — block images that haven’t been cryptographically signed by your CI pipeline.
Why Kyverno (not OPA Gatekeeper)
Two mature options exist. The research found Kyverno to be the clear winner for a single-admin homelab:
| Kyverno | OPA Gatekeeper | |
|---|---|---|
| Policy language | YAML — same as the rest of Kubernetes | Rego — a specialised language with a steep learning curve |
| Image signature verification | Built-in verifyImages rule |
No built-in; needs external tooling |
| Mutation (auto-fix) | Yes | Limited |
| Generate resources automatically | Yes | No |
| CNCF status | Graduated (2023) | OPA is Graduated |
| Memory (single-node) | ~150 MB | ~270 MB |
Gatekeeper is the right choice only if you already have a large investment in Rego policies. For everyone else, Kyverno’s YAML policies are readable, auditable, and actionable.
Native ValidatingAdmissionPolicy (CEL) — the webhook-free option
Kubernetes 1.30 (GA) introduced ValidatingAdmissionPolicy, which uses a simple expression language (CEL) built directly into the API server — no webhook, no extra deployment. It’s ideal for simple checks like “all containers must have resource limits.” For anything more complex (image signature verification, mutation, resource generation), you still need Kyverno.
You can see an example CEL policy in scripts/config/supplychain/kyverno-policies/ comments, but Kyverno is the primary engine here. Kyverno 1.12+ can even emit ValidatingAdmissionPolicy objects from its own policies, giving you both.
Installing Kyverno
The script scripts/cluster/23-policy-kyverno.sh handles this end-to-end. What it does:
- Adds the Kyverno Helm repo and installs the Kyverno engine into the
kyvernonamespace. - Installs
kyverno-policiesat the restricted Pod Security Standard level — this is roughly 15 pre-built policies covering the most dangerous attack vectors. - Applies the six custom cluster-policies defined in
scripts/config/supplychain/kyverno-policies/.
sudo bash scripts/cluster/23-policy-kyverno.sh
Kyverno runs as a validating webhook. Until it is healthy (all pods Running), the cluster will refuse to schedule new workloads. The script waits for the rollout to complete before proceeding.
The six custom policies and what each one blocks
1. Disallow privileged containers — disallow-privileged.yaml
“Privileged mode” means the container has the same access to the host kernel as if it were running directly on the machine — no walls at all. This policy blocks that unconditionally.
A container running with privileged: true can load kernel modules, access all devices, and escape every namespace boundary. It’s the single most dangerous container setting. This policy blocks it for all containers, init containers, and ephemeral containers.
2. Require resource limits — require-limits.yaml
Without resource limits, one runaway process can eat all memory on the node and crash everything else. This policy forces every container to declare a ceiling.
Every container must declare both a CPU limit and a memory limit. Without this, a single compromised or buggy workload can trigger an out-of-memory condition that brings down the entire node — a cheap denial-of-service attack.
3. Require read-only root filesystem, non-root user, and dropped capabilities — require-ro-rootfs-nonroot-drop-caps.yaml
Three rules in one: (a) the container’s main disk area is read-only, so malware can’t write files to disguise itself; (b) processes run as an unprivileged user, not root; © all Linux “superpowers” (capabilities like NET_ADMIN, SYS_PTRACE) are stripped. A compromised container can do very little damage.
readOnlyRootFilesystem: true— file system writes must go to explicitly mounted volumes.runAsNonRoot: true— processes cannot run as UID 0.allowPrivilegeEscalation: false— the process cannot gain new privileges (preventssetuidescalation).capabilities.drop: [ALL]— all Linux capabilities are stripped; add back only specific ones if the workload genuinely needs them.
Also disallows hostPath volumes (direct access to the host filesystem) and hostNetwork: true (sharing the host’s network stack).
4. Restrict image registries — restrict-registries.yaml
Docker Hub has millions of images, most unmaintained. This policy says: we only allow images from our short approved list. Pull from anywhere else and admission is denied.
Images must come from an allowlist. Edit the policy to match your setup — the default template allows registry.homelab.local (your private registry), ghcr.io (GitHub Container Registry), and cgr.dev (Chainguard hardened images). Anything else — including raw docker.io pull-throughs — is blocked at admission.
5. Require gVisor RuntimeClass in the untrusted namespace — require-runtimeclass-untrusted.yaml
Any pod scheduled in a namespace labelled untrusted must declare runtimeClassName: gvisor. This enforces the gVisor sandbox (see Layer 3) for tenant workloads you don’t fully control. If a workload doesn’t set it, admission is denied — it cannot accidentally run with the default runc runtime.
You must have gVisor installed (Layer 3 script) and the gvisor RuntimeClass created before this policy can match successfully. The script creates the RuntimeClass as part of the policy apply step.
6. Block unsigned images — verify-images-cosign.yaml
This is the most powerful policy in this chapter. Every image from ghcr.io/your-org/* must have a valid cosign signature from your GitHub Actions workflow. If the signature is missing or invalid, the pod is rejected — even if the image tag and digest are correct.
This policy ties the supply-chain (see Part 2) to admission control. The signature is checked against Sigstore’s Rekor transparency log; the subject must be your specific workflow file on your specific branch. Replacing the image tag or tampering with the image after signing invalidates the signature and blocks the pod.
The policy in verify-images-cosign.yaml uses keyless verification — no public key to manage or rotate. The trust anchor is GitHub’s OIDC issuer and the Rekor log.
Part 2 — Supply chain: proving the image is what you think it is
The threat this solves
Imagine someone replaces a package inside a Docker image after you built it, then pushes it to your registry under the same name. Without supply-chain controls, your cluster pulls and runs the tampered image. With them, Kyverno checks the cryptographic signature — and the tampered image has no valid signature, so it never runs.
cosign keyless signing — no long-lived keys
Traditional image signing uses a private key you must store securely, rotate, and never lose. Cosign keyless eliminates the key entirely by using the CI job’s own identity:
- GitHub Actions proves it’s running your workflow (via GitHub’s OIDC token).
- Sigstore’s Fulcio CA issues a short-lived certificate binding that identity to a public key.
- cosign signs the image using that ephemeral key and records the signature in Rekor, a public, append-only transparency log.
- The certificate (and thus the key) expires in 10 minutes. No long-lived secret exists anywhere.
When Kyverno verifies the image later, it checks: “Is there an entry in Rekor signed by a certificate from Fulcio that says this digest came from github.com/myorg/myrepo/.github/workflows/build.yaml on main?” If yes, the image is trusted.
SBOM: the ingredient list for your image
A Software Bill of Materials (SBOM) is a machine-readable list of every library and dependency inside the image — think of it as a nutrition label for software. syft generates this list in the CycloneDX or SPDX format, and cosign attaches it to the image in the registry as an OCI attestation (same content-addressable layer system as the image itself).
Benefits:
- If a new CVE drops (e.g.,
log4j), you can query your SBOM store to know instantly which images are affected, without rescanning everything. - Trivy (running in CI and in-cluster via Trivy Operator) reads the SBOM to produce the vulnerability report.
- This combination reaches SLSA Level 2 — the image has provenance from a hosted build system.
| SLSA Level | What it means | How we achieve it |
|---|---|---|
| L1 | Provenance document exists | cosign attest + syft SBOM in CI |
| L2 | Provenance from a hosted build platform | GitHub Actions + keyless cosign |
| L3 | Hardened build platform (tamper-resistant) | GitHub Actions hardened runners |
| L4 | Hermetic, fully reproducible build | Advanced; skip for homelab |
The CI pipeline (GitHub Actions)
The file scripts/config/supplychain/cosign-sign-ci-example.yaml contains the full workflow. Key steps:
- Build and push the image to
ghcr.io, capturing the exact digest (sha256:...). - Install cosign via
sigstore/cosign-installer. - Sign the image digest (not just the tag) with
cosign sign --yes. No key needed — GitHub provides the OIDC token. - Generate SBOM with syft in CycloneDX-JSON format.
- Attest the SBOM with
cosign attest, storing it as an OCI attestation alongside the image.
Always sign the digest, not the tag. A tag like v1.2.3 is mutable — someone can push a different image under the same tag. A digest (@sha256:abc123...) is immutable. The Kyverno verifyImages policy checks the digest.
Private registry: Harbor or registry:2
A private registry is your own internal copy of Docker Hub. Images you’ve built and vetted live here, and your cluster only pulls from here (enforced by the restrict-registries policy). Nothing comes from the public internet at runtime.
Harbor (CNCF Graduated) is the full-featured choice: built-in Trivy scanning, SBOM storage, cosign integration, a web UI, and pull-through caching of upstream registries.
registry:2 (the official Docker Distribution binary) is the minimal choice: it’s a single container, uses almost no RAM, and can be configured as a pull-through cache.
| Harbor | registry:2 | |
|---|---|---|
| RAM footprint | ~1 GB | ~50 MB |
| Built-in scanning | Yes (Trivy) | No |
| SBOM / cosign UI | Yes | No |
| Pull-through cache | Yes | Yes |
| Use when | ≥ 8 GB RAM | RAM-constrained |
Scanning and signing still happen in CI regardless of which registry you use — Harbor adds a visual layer on top.
Part 3 — Secrets: keeping sensitive values out of Git and encrypted at rest
Two problems, two solutions
Problem 1: secrets at rest in the cluster. Kubernetes stores its Secret objects in its database (etcd — the embedded etcd datastore configured in Layer 2). Without encryption, anyone who can read that file — or who steals a cluster backup — has all your secrets in plain text.
Solution: k3s --secrets-encryption. One flag at install time. k3s wraps every Secret object with secretbox (XSalsa20-Poly1305 encryption) before writing it to disk. The encryption key is stored in /var/lib/rancher/k3s/server/cred/encryption-config.json — protect this file like a root password.
cat /etc/rancher/k3s/config.yaml | grep secrets-encryption
# should show: secrets-encryption: true
# Verify the encryption config was written
cat /var/lib/rancher/k3s/server/cred/encryption-config.json
# shows the secretbox provider and key material
Rotate the encryption key quarterly:
k3s secrets-encrypt prepare # generates new key
k3s secrets-encrypt rotate-keys # re-encrypts all secrets with new key
k3s secrets-encrypt status # confirms all nodes are in sync
Problem 2: secrets in Git (GitOps). If you store Kubernetes manifest files in a Git repository (which you should, for repeatability), you cannot put raw Secret values in those files. Base64 is not encryption — anyone who can read the repo can read the secrets.
SOPS + age: the recommended solution
Imagine you have a padlock (the age public key) that anyone can close, but only you have the key to open. You click the padlock shut on your secrets file before committing to Git. The locked file is safe to share publicly. Only the person with the physical key (your laptop’s age private key, and your CI) can open it.
age is a modern, simple encryption tool. SOPS is a tool that encrypts YAML/JSON files in-place — it leaves the structure readable (you can see that a key is called password) but encrypts the value. This means git diff and code review still work.
Why not Sealed Secrets? Sealed Secrets is a good alternative (see scripts/config/supplychain/sops-age-example.md for a comparison), but its decryption key lives inside the cluster — if you lose the cluster before backing up that key, you can never recover your sealed secrets. The age private key lives on your laptop and in CI, entirely independent of the cluster.
The key flow
- Admin laptop: generate one age key pair, store the private key at
~/.config/sops/age/keys.txt. The public key goes in.sops.yamlin the GitOps repo root. - CI (GitHub Actions): store the same age private key as a repository secret (
AGE_SECRET_KEY). CI uses it to decrypt during dry-run validation. - Flux in the cluster: the age private key is loaded once into a Kubernetes Secret in
flux-system. Flux uses it to decrypt SOPS-encrypted manifests before applying them. The private key never leaves these three places — it is never checked into Git.
Full step-by-step is in scripts/config/supplychain/sops-age-example.md.
Hard rules for secrets
These rules apply to every workload in the cluster, no exceptions:
- Never store secret values in
ConfigMap, pod environment variable annotations, or image layers. - Mount Secrets as files (volumes), not environment variables. Environment variables are visible in
/proc/<pid>/environand appear in crash dumps andkubectl describe podoutput. - Set
readOnly: trueon all Secret volume mounts. - Never use
envFrompointing to a Secret in a pod spec from a public image — the Secret name is exposed in the spec and can be harvested. - The age private key lives in exactly three places: your laptop, CI secrets, and the
flux-system/sops-ageKubernetes Secret. Nowhere else.
Part 4 — GitOps with Flux v2
What GitOps means
Instead of running kubectl apply by hand, you commit your desired cluster state to a Git repository. A controller (Flux) watches that repo and automatically applies changes. The Git history becomes the audit log of every change ever made to the cluster. Rolling back is a git revert.
Why Flux over ArgoCD
Both are production-grade GitOps controllers. For a single-admin hardened homelab:
| Flux v2 | ArgoCD | |
|---|---|---|
| Memory | ~150–200 MB | ~400–600 MB |
| SOPS native | Yes — first-class .decryption field |
Via plugin |
| UI | CLI-only (smaller attack surface) | Rich web UI |
| Multi-cluster | Possible, somewhat manual | Excellent, native |
| ARM64/Pi | Yes | Yes |
Flux wins on: footprint, SOPS-native decryption, and smaller attack surface (no UI to secure). Choose ArgoCD if you need the UI or manage multiple clusters.
Flux bootstrap
The script scripts/cluster/26-supplychain.sh handles the optional Flux bootstrap (set FLUX_ENABLED=1 to activate). The full manual steps and the egress NetworkPolicy that locks down the flux-system namespace are in scripts/config/supplychain/flux-bootstrap.md.
The flux-system NetworkPolicy restricts Flux’s egress to: your Git host (port 443), your container registry (port 443), the Kubernetes API server (port 6443), and DNS (port 53/UDP). Flux has no business talking to anything else.
After bootstrap, verify that Flux can see and reconcile your repo:
flux get sources git
flux get kustomizations
flux logs --follow --level=error # watch for reconciliation errors
Putting it all together: the chain of custody
Every workload in your cluster travels this chain:
- Code is committed and a CI build starts.
- Trivy scans the image for known vulnerabilities. A critical-severity finding blocks the build.
- cosign signs the image digest with a keyless signature tied to the specific workflow and branch. Rekor records it.
- syft generates a CycloneDX SBOM and cosign attests it alongside the image.
- The image is pushed to the registry (ghcr.io or Harbor). The GitOps manifests are updated with the new digest and SOPS-encrypted secrets are committed.
- Flux detects the change, decrypts any encrypted secrets using the age key, and sends the manifests to the Kubernetes API.
- Kyverno’s
verifyImagespolicy checks the signature against Rekor. If valid: the pod is admitted. If missing or invalid: admission is denied, the pod never starts, and an audit event is logged.
What this layer bought you
| Capability | What it prevents |
|---|---|
| Kyverno restricted PSS | Privileged containers, hostPath mounts, host networking, missing security contexts |
| Resource limits enforcement | Node-level denial-of-service from runaway workloads |
| Registry allowlist | Images pulled from untrusted or compromised public registries |
| gVisor RuntimeClass enforcement | Unintended sandbox escapes in the untrusted namespace |
| cosign keyless signing + Kyverno verifyImages | Tampered images, images built outside CI, supply-chain attacks |
| syft SBOM attestations | Unknown dependencies; enables instant CVE blast-radius queries |
k3s --secrets-encryption |
Secrets exposed via cluster database file or backup theft |
| SOPS + age | Secrets committed to Git in plain text; secrets lost if cluster is rebuilt |
| Flux NetworkPolicy | Compromised Flux controller exfiltrating data to arbitrary endpoints |
A cluster that completes this layer will run only signed, scanned images from approved registries, enforce that all workloads run as unprivileged read-only processes with resource caps, keep secrets encrypted at rest and in Git, and reconcile state from a version-controlled repository with a full audit trail.