Skip to content

Nexus Docker Mirror Cutover

Nexus docker-mirror + buildcache cutover (replaced registry-cache CT113)

Section titled “Nexus docker-mirror + buildcache cutover (replaced registry-cache CT113)”

This is the historical record of folding the standalone registry-cache LXC (CT113 @ 10.10.10.150: docker.io proxy :5000, ghcr proxy :5002, build cache :5001, k3s mirror for 6 nodes) into the single Nexus instance. This was the irreversible live step of Nexus consolidation P3 — it was applied deliberately, node-by-node, with rollback ready. The cutover is complete (CT113 destroyed 2026-06-18); keep this page as the procedure of record and rollback reference.

The single cache plane is now Nexus — see Build & Package Cache Topology (Nexus).

  • ops/nexus/repos.json: docker-dockerhub (proxy of registry-1.docker.io) and docker-group (members docker-internal + docker-dockerhub + docker-ghcr, read connector port 8083).
  • platform/k8s/apps/nexus: Service + Deployment expose port 8083 (docker-group).

Step 1 — provision the new repos (additive, safe)

Section titled “Step 1 — provision the new repos (additive, safe)”

From a node that can reach the Nexus ClusterIP (per the original P1 provisioning):

Terminal window
NEXUS_URL=http://<nexus-clusterip>:8081 NEXUS_ADMIN_PASSWORD=<admin-pw-from-OpenBao> \
python3 ops/nexus/provision_repos.py

Idempotent. Verify: docker-group answers V2 on 8083 and serves a docker.io pull:

Terminal window
curl -s -o /dev/null -w '%{http_code}' http://<nexus>:8083/v2/ # 401/200 = connector up

docker-group (8083) must be reachable from every k3s node for registries.yaml. Add a NodePort (e.g. 30083 -> 8083) to the nexus Service, OR confirm nodes can reach the ClusterIP :8083 via kube-proxy (P1 used 10.43.x:8081 from node 146). Prefer NodePort + http://127.0.0.1:30083 per node for stability across pod reschedules.

Step 3 — cut over the k3s registry mirror (per node, one at a time)

Section titled “Step 3 — cut over the k3s registry mirror (per node, one at a time)”

For each node in topology registry_cache.k3s_mirror_nodes (pve-worker0:112, pve-worker1:143, pve-worker2:129, pve-worker3:145, pve1:141, pve2:146), rewrite /etc/rancher/k3s/registries.yaml:

mirrors:
docker.io:
endpoint: ["http://127.0.0.1:30083"]
ghcr.io:
endpoint: ["http://127.0.0.1:30083"]
configs:
"127.0.0.1:30083":
auth: { username: "shim-puller", password: "<FRACTALOPS_NEXUS_PULL_PASSWORD from OpenBao runtime scope>" }

Then systemctl restart k3s (control) / k3s-agent (workers). Verify a fresh pull before moving to the next node (crictl pull docker.io/library/busybox:latest). Roll back that node (restore old registries.yaml → CT113 + restart) on any failure.

Repoint topology registry_cache.buildcache_ref (10.10.10.150:5001/fractalops/buildcache) → nexus.nexus.svc:8082/fractalops/buildcache (docker-internal hosted; platform CI image builds). Ensure CI has Nexus push creds (FRACTALOPS_NEXUS_DOCKER_USER/PASSWORD). cache_to/from already carry ignore-error=true, so a misconfig degrades to a cache miss, never a build failure. Also repoint the langboard build_image_prefix (topology ~L1266).

Step 5 — decommission CT113 — DONE (2026-06-18 LXC destroyed; repo cleanup 2026-06-20)

Section titled “Step 5 — decommission CT113 — DONE (2026-06-18 LXC destroyed; repo cleanup 2026-06-20)”
  • LXC CT113 stopped + destroyed (2026-06-18).
  • Deleted ops/infra/install_registry_cache.sh (+ its test). (No registry-cache Makefile targets existed.)
  • topology registry_cache block repointed to Nexus (host_ip = Nexus ClusterIP, buildcache_ref: nexus.nexus.svc:8082/..., ports 8082/8083); the historical key name is kept as the consumed build-cache config (no dead 10.10.10.150/CT id left).
  • Docs aligned to the single Nexus cache plane (operations/docker-cache, current-stack-IA, requirements, manual-ko).
  • pip: point uv/pip index at https://nexus.yamon.io/repository/pypi-group/simple/ (or in-cluster http://nexus.nexus.svc:8081/repository/pypi-group/simple/); decommission pip-cache LXC.
  • apt: point sources at https://nexus.yamon.io/repository/apt-debian/; decommission apt-cacher LXC.
  • apt-debian proxies debian/trixie only; add an apt-ubuntu proxy repo first if any image pulls from ubuntu mirrors.

Restore each node’s previous registries.yaml (CT113 endpoints) + restart k3s; revert topology buildcache_ref. CT113 stayed running until Step 5, so rollback was always available during the cutover.