K3s Node Hygiene
K3s Node Hygiene
Section titled “K3s Node Hygiene”FractalOps K3s nodes must not depend on manual container runtime cleanup. Apply the managed K3s drop-in with:
platform/k8s/reconcile_k3s_node_gc.shSet RESTART=true only during a maintenance window:
RESTART=true platform/k8s/reconcile_k3s_node_gc.shThe drop-in configures kubelet image garbage collection, disk eviction thresholds, and container log rotation:
kubelet-arg: - "image-gc-high-threshold=85" - "image-gc-low-threshold=80" - "eviction-hard=nodefs.available<10%,imagefs.available<10%" - "eviction-minimum-reclaim=nodefs.available=5Gi,imagefs.available=5Gi" - "container-log-max-size=20Mi" - "container-log-max-files=3"Do not add terminated-pod-gc-threshold; this K3s/Kubernetes version rejects
that kubelet flag and the node will fail readiness.
Routine validation:
ssh root@10.10.10.41 'kubectl get --raw=/readyz?verbose'ssh root@10.10.10.41 'kubectl get nodes -o wide'ssh root@10.10.10.41 'kubectl get cronjobs -A'Daytona runner nodes are intentionally Docker-backed. On Debian-packaged Docker,
socket activation can leave /run/docker.sock present while docker info
cannot connect. Reconcile runner nodes with:
platform/k8s/reconcile_daytona_runner_docker.shThe script selects the daytona-sandbox-c=true node, disables docker.socket,
and makes docker.service own unix:///run/docker.sock directly. This keeps
the Daytona runner chart light: node runtime preparation stays on the node,
not in a Kubernetes mutation shim.
Keep the Daytona runner region aligned with the live runner registration. The
K3s runner currently registers as fractalops-k3s; do not reintroduce the old
dev-workspaces_v7FU region or the scheduler will report No available runners even when the runner pod is healthy. Keep Kubernetes spec changes in
the Daytona Helm values or chart version. Do not use kubectl set env or ad hoc
DaemonSet JSON patches as a permanent reconciliation path.
For finished workload buildup, prefer native retention settings:
CronJob.spec.successfulJobsHistoryLimitCronJob.spec.failedJobsHistoryLimitJob.spec.ttlSecondsAfterFinishedwhen the chart exposes itDeployment.spec.revisionHistoryLimitfor high-churn Deployments
Use direct deletion only for already-stale objects after checking owners:
kubectl delete pod -A --field-selector=status.phase=Succeeded --wait=falsekubectl delete pod -A --field-selector=status.phase=Failed --wait=falsekubectl get rs -A -o jsonpath='{range .items[?(@.spec.replicas==0)]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}' \ | while read -r ns name; do kubectl -n "$ns" delete rs "$name" --wait=false; done