Helm Solution Reinvention Audit
Helm Solution Reinvention Audit
Section titled “Helm Solution Reinvention Audit”Read-only audit run: 2026-06-11.
Goal: compare FractalOps Helm/Argo integrations against current upstream APIs, charts, and operators. Remove local control logic where upstream reconciliation already exists. Keep only thin values, policy, and integration overlays.
Priority Cut Order
Section titled “Priority Cut Order”-
Daytona runtime transport
- Local smell: custom toolbox endpoint probing, base64/process upload fallback,
and marker-string readiness in
daytona_workspace.py,daytona_runner.py, anddaytona_toolbox_readiness.py. - Replace with: official Daytona Python SDK
sandbox.process.exec,sandbox.fs.upload_file,sandbox.fs.upload_files, plus one caller retry policy. - Sources: https://www.daytona.io/docs/en/python-sdk/, https://www.daytona.io/docs/en/python-sdk/sync/process/, https://www.daytona.io/docs/en/python-sdk/sync/file-system/
- Local smell: custom toolbox endpoint probing, base64/process upload fallback,
and marker-string readiness in
-
Redpanda event stream
- Local smell: raw Apache Kafka
StatefulSet, manual KRaft env, andtopic-init-job, while product language and topology already use Redpanda. - Replace with: official Redpanda Helm chart. Keep
kafkaonly as the Kafka-compatible protocol endpoint and Argo release name for clients. - Sources: https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/kubernetes/helm-chart/, https://docs.redpanda.com/current/reference/k-redpanda-helm-spec/
- Status: done.
platform/k8s/apps/kafkais now a Redpanda chart wrapper owning StatefulSet/PVC/listeners. Local chart keeps Kafbat UI, ingress, OpenBao-backed OIDC secret, and network policy.
- Local smell: raw Apache Kafka
-
Temporal control plane
- Local smell: DB-backed queues, retry budgets, mailbox wakeups, runner reconciliation, status projection, and schedule dedupe duplicate Temporal primitives.
- Replace with: typed Workflows, Activities with
RetryPolicy, Signals, Queries, Schedules,workflow.wait_condition, Search Attributes, Activity heartbeats, and SDK tracing. Keep DB rows as projection/audit only. - Sources: https://docs.temporal.io/develop/python/best-practices/error-handling, https://docs.temporal.io/develop/python/workflows/message-passing, https://docs.temporal.io/develop/python/workflows/schedules, https://docs.temporal.io/search-attribute, https://docs.temporal.io/develop/python/platform/observability
-
Project / in-sandbox build plane (retired)
- Status: removed. The in-sandbox compose build-and-ship plane and the
per-project image build pipeline were deleted.
dockeris a permanent exit-127 wall inside the agent sandbox; dev previews run as bare processes exposed through the daytona-proxy signed preview URL (see Dev Preview Plane). - The only surviving image build is the platform’s own CI runtime-image release pipeline, which is GitOps-pinned and uses the single Nexus build cache. There is no privileged BuildKit Deployment / Service / TCP IngressRoute in the runtime chart, and no unauthenticated build edge.
- Persistent project services (databases, static-site / vercel-sim hosting, big-facility compose) live on the project’s Dokploy plane, not the sandbox.
- Status: removed. The in-sandbox compose build-and-ship plane and the
per-project image build pipeline were deleted.
-
GitHub Actions runner
- Local smell: static privileged runner Deployments, broad Kubernetes RBAC, and long-lived provider/deploy secrets injected into idle runner pods.
- Replace with: Actions Runner Controller
gha-runner-scale-set, ephemeral runners, GitHub App auth, runner groups,containerMode(dindor Kubernetes), and deploy secrets scoped to workflow/environment identity. - Sources: https://docs.github.com/en/actions/concepts/runners/actions-runner-controller, https://docs.github.com/en/actions/how-tos/manage-runners/use-actions-runner-controller/deploy-runner-scale-sets, https://docs.github.com/en/actions/reference/runners/self-hosted-runners, https://docs.github.com/actions/hosting-your-own-runners/customizing-the-containers-used-by-jobs
-
OpenBao / External Secrets / SPIRE
- Local smell: app runtime performs direct OpenBao login/cache and custom SPIFFE JWT-SVID/protobuf handling while ESO and SPIRE are already installed.
- Replace with: ESO-managed Kubernetes
Secrets, scopedSecretStoreor constrainedClusterSecretStore, SPIREClusterSPIFFEID, CSI socket, and OpenBao JWT/OIDC validation through standard auth paths. - Sources: https://external-secrets.io/main/provider/hashicorp-vault/, https://external-secrets.io/main/api/clustersecretstore/, https://openbao.org/docs/next/auth/kubernetes/, https://openbao.org/docs/next/auth/jwt/, https://github.com/spiffe/spire-controller-manager/blob/main/docs/clusterspiffeid-crd.md, https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE_Workload_API.md
-
Traefik / Pomerium route plane
- Local smell: central
edge-bridgeIngress registry and Pomerium IngressClass that is not actually used by routes. - Replace with: Gateway API
GatewayClass/Gateway/HTTPRoutefor Traefik, or route-level Pomerium Ingress annotations when Pomerium owns authn/authz. - Sources: https://doc.traefik.io/traefik/reference/routing-configuration/kubernetes/gateway-api/, https://kubernetes.io/docs/concepts/services-networking/gateway, https://www.pomerium.com/docs/deploy/k8s/ingress
- Local smell: central
-
Argo CD delivery
- Local smell: 62 hand-authored Application manifests and rollout governance
in
ops/ci/observe_runtime_deployment_rollout.sh. - Replace with:
ApplicationSet, sync waves/hooks, Argo notifications, and Argo CD Image Updater. GitHub Actions should emit diagnostics only. - Sources: https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/applicationset-specification/, https://argo-cd.readthedocs.io/en/latest/user-guide/sync-waves/, https://argo-cd.readthedocs.io/en/stable/operator-manual/notifications/, https://argocd-image-updater.readthedocs.io/en/stable/
- Local smell: 62 hand-authored Application manifests and rollout governance
in
-
DataHub
- Local smell: backend semantic graph projection/query/snapshot code shadows
DataHub Metadata Graph, lineage, GraphQL/OpenAPI, MCP/MCL, Actions, and
OpenLineage ingestion.
datahub-ingestion-cronis disabled while custom lineage graph code grows. - Replace with: DataHub entities/aspects/relationships, Dataset/DataJob lineage, DataHub GraphQL for reads, Python SDK/OpenAPI/REST sink for writes, DataHub Actions/Kafka for reactions, and native ingestion cron where useful.
- Sources: https://docs.datahub.com/docs/metadata-modeling/metadata-model, https://docs.datahub.com/docs/api/graphql/overview, https://docs.datahub.com/docs/metadata-ingestion/as-a-library, https://docs.datahub.com/docs/actions, https://docs.datahub.com/docs/lineage/openlineage
- Local smell: backend semantic graph projection/query/snapshot code shadows
DataHub Metadata Graph, lineage, GraphQL/OpenAPI, MCP/MCL, Actions, and
OpenLineage ingestion.
-
Hasura
- Local smell: custom nginx console proxy and curl-based metadata PostSync job.
- Status: proxy and metadata PostSync job removed; direct Hasura service route
remains. Runtime Deployment/Service/ConfigMap now come from the official
hasura/graphql-enginechart; OpenBao ExternalSecret and Pomerium ingress stay as FractalOps integration overlays. - Remaining: metadata lifecycle should stay outside ad-hoc hooks; use
cli-migrations-v3orhasura metadata applyin CI/CD when metadata automation returns. - Sources: https://hasura.io/docs/2.0/deployment/deployment-guides/kubernetes-helm/, https://hasura.io/docs/2.0/migrations-metadata-seeds/auto-apply-migrations/
- OpenTelemetry Collector
- Local smell: custom collector mini-chart with empty values and no upstream presets for Kubernetes attributes or log collection.
- Status: local collector Deployment/Service/ConfigMap removed; chart is now
an upstream
open-telemetry/opentelemetry-collectorwrapper with only FractalOps collector config and Traefik edge overlay. - Replace with: upstream
open-telemetry/opentelemetry-collectorchart wrapper withmode,config,presets.kubernetesAttributes, and optionalpresets.logsCollection. - Sources: https://github.com/open-telemetry/opentelemetry-helm-charts, https://opentelemetry.io/docs/collector/
- Headlamp
- Local smell: custom Deployment, Ingress, OIDC secret, RBAC, and plugin args.
- Status: local Deployment/Service/Ingress/RBAC templates removed; chart is
now an upstream
kubernetes-sigs/headlampwrapper with OpenBao ESO and FractalOps group RBAC asextraManifests. - Replace with: upstream
kubernetes-sigs/headlampchart values for OIDC, ingress, RBAC, and plugin manager. - Sources: https://github.com/kubernetes-sigs/headlamp/tree/main/charts/headlamp, https://headlamp.dev/docs/latest/installation/in-cluster/oidc/
- Nexus
- Local smell: custom Deployment, PVC, ConfigMap, Service, and Ingress duplicate official chart behavior.
- Status: Deployment, Service, PVC, and Ingress moved to upstream
nexus/nexuschart values. FractalOps keeps only the custom Pomerium auth config, GHCR pull ExternalSecret, and NetworkPolicy overlay. - Replace with: official Nexus chart dependency and values for persistence, uplinks, auth config, and ingress.
- Sources: https://github.com/nexus/charts, https://charts.nexus.org/index.yaml, https://www.nexus.org/docs/configuration
-
Mimir
- Local smell: single
grafana/mimir -target=allpod is named like an operational Mimir deployment. - Status: local all-in-one Deployment/ConfigMap/PVC/Service removed; chart is
now an upstream
grafana/mimir-distributedwrapper. Gateway service name remainsmimirso existing OTLP/Prometheus clients keep the same in-cluster URL. Storage remains single-replica filesystem until an external object store is the source of truth. - Replace with: upstream
grafana/mimir-distributedchart values for gateway, storage, component replicas, and rollout ownership. - Sources: https://grafana.com/docs/helm-charts/mimir-distributed/latest/, https://grafana.com/docs/mimir/latest/references/architecture/deployment-modes/
- Local smell: single
-
Supabase Storage / Realtime
- Local smell: stale images, duplicated DB secret shapes, embedded Postgres for storage, non-canonical env names, TCP-only probes.
- Replace with: upstream self-host env contract, current images, one DB secret, HTTP health endpoints, and external DB where possible.
- Sources: https://supabase.com/docs/guides/self-hosting/storage/config, https://supabase.com/docs/guides/self-hosting/realtime/config, https://raw.githubusercontent.com/supabase/supabase/master/docker/docker-compose.yml
-
ClickHouse
- Local smell: standalone PVC bound into StatefulSet and Argo PostSync Kafka ingest job.
- Status: standalone PVC and PostSync ingest job removed. The StatefulSet now
owns storage through
volumeClaimTemplates, and Kafka ingest SQL is mounted into/docker-entrypoint-initdb.d. - Replace with: StatefulSet
volumeClaimTemplates, ClickHouse native init scripts, or Altinity ClickHouse Operator if cluster operation grows. - Sources: https://docs.altinity.com/altinitykubernetesoperator/, https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/, https://clickhouse.com/docs/engines/table-engines/integrations/kafka
-
Grafana
- Local gap: Mimir existed without the dashboard/query surface chart.
- Status: added upstream
grafana/grafanachart wrapper with Mimir datasource, OpenBao-backed admin secret, Pomerium ingress, and Longhorn persistence. - Replace with: upstream Grafana chart values for datasource provisioning, ingress, persistence, and admin secret wiring.
- Sources: https://github.com/grafana/helm-charts, https://grafana.com/docs/grafana/latest/administration/provisioning/
-
Novu datastores
- Local smell: Novu chart embedded raw MongoDB/Valkey Deployments, root chown initContainers, hand-written Services, and standalone PVCs.
- Status: raw MongoDB and Valkey workloads removed. Novu now uses Bitnami
MongoDB and Valkey chart dependencies. Existing
novu-mongodbandnovu-valkeyservice names remain stable for app pods; OpenBao still owns the runtime secret. - Replace with: upstream chart values for datastore StatefulSets, persistence, probes, securityContext, and service wiring.
- Sources: https://github.com/bitnami/charts/tree/main/bitnami/mongodb, https://github.com/bitnami/charts/tree/main/bitnami/valkey
-
Longhorn
- Local smell: custom Longhorn Node CR contract for disks and reservations.
- Status: Longhorn
NodeCR manifests removed from FractalOps GitOps. Runtime storage now setscreateDefaultDiskLabeledNodes=trueand applies standard Kubernetes Node labels/annotations withPrune=falsefor default disk/tag bootstrap. StorageClasses remain explicit FractalOps contracts. - Replace with: Longhorn node labels/annotations, default disk config, storage class selectors, and storage tags.
- Sources: https://longhorn.io/docs/1.12.0/nodes-and-volumes/nodes/default-disk-and-node-config/, https://longhorn.io/docs/1.12.0/references/storage-class-parameters/
-
OpenFGA
- Local smell: tuple write path exists, but model lifecycle, Check, ListObjects/ListUsers, conditions, and contextual tuples are mostly absent.
- Replace with: explicit OpenFGA model validation plus typed Check/List/Write adapter. Do not infer authorization from local DB-only membership.
- Sources: https://openfga.dev/docs/configuration-language, https://openfga.dev/docs/getting-started/perform-check, https://openfga.dev/docs/getting-started/perform-list-objects, https://openfga.dev/docs/interacting/contextual-tuples
-
Windmill
- Local smell:
reconcile-oidc-job.yamldirectly mutates Windmill database global settings through SQL PostSync. - Status: SQL PostSync job removed; OIDC/base URL moved to Windmill operator instanceSpec values. Verify upstream chart schema during next deploy.
- Replace with: official Windmill Helm operator
windmill.operator.enabledandwindmill.operator.instanceSpecfor OAuth, base URL, preexisting-user policy, workspace/group/user bootstrap, plus ExternalSecret only for referenced client secrets. - Sources: https://www.windmill.dev/docs/core_concepts/infrastructure_as_code, https://raw.githubusercontent.com/windmill-labs/windmill-helm-charts/main/charts/windmill/values.yaml
-
Velero
- Local good path: main
velero-values.yamlalready uses official chart values for backup storage, schedules, and node agent. - Local cleanup:
velero-configseparately creates object-store credentials and a MinIO bucket job. Bucket creation belongs to storage provisioning; credentials belong to the Velero chart secret path or pre-provisioned secret. - Status:
velero-configapp removed; object-store ExternalSecret moved to Velero chartextraObjects; bucket init job removed. - Replace with: one Velero chart surface, official
credentialsvalues, and storage-layer bucket provisioning. - Sources: https://velero.io/docs/main/locations/, https://velero.io/docs/v1.17/api-types/backupstoragelocation/, https://velero.io/docs/main/file-system-backup/
- Local good path: main
-
Penpot
- Local good path: OIDC values mostly use the official Penpot chart.
- Local cleanup: bundled PostgreSQL/Valkey values and stable-service wrapper are stale; current chart expects external PostgreSQL/Valkey. Hand-created assets PVC can move to chart persistence unless custom labels are required.
- Status: stable-service wrapper and hand-created assets PVC removed; chart persistence now owns the assets claim. External PostgreSQL/Valkey cutover is still pending.
- Replace with: external DB/Valkey connection values, chart-managed asset persistence, public/internal OIDC endpoint split, and no stable-service wrapper once dependencies are externalized.
- Sources: https://help.penpot.app/technical-guide/configuration/, https://github.com/penpot/penpot-helm/blob/develop/charts/penpot/README.md
-
Kyverno
- Local good path: PVC storage contract is correctly modeled as Kyverno policy.
- Local cleanup: repeated baseline
NetworkPolicytemplates and duplicated app-local wall policy ConfigMaps can move to Kyverno generate/validate when they are platform admission policy. - Sources: https://kyverno.io/docs/installation/installation/, https://kyverno.io/docs/policy-types/cluster-policy/validate/, https://kyverno.io/docs/policy-types/cluster-policy/generate/
Decision Rules
Section titled “Decision Rules”- Prefer upstream Helm chart or operator CR over local
Deployment/StatefulSettemplates. - Use CUE to produce values and policy inputs, not to recreate Helm templates.
- Keep local charts only for FractalOps-owned services or thin integration overlays.
- Move bootstrap/reconciliation out of shell jobs when the provider has a native controller, hook, CRD, or CLI workflow.
- Keep Kyverno for admission/platform policy; keep OpenFGA/OPA for application authorization only when they are real decision points.