Skip to content

Engineering Handoff

Read this document when you need one compact engineering handoff for the current FractalOps system.

It does not redefine product law. Start there, then use this page to move from doctrine to execution control.

  1. FractalOps Constitution
  2. FractalOps Canonical Architecture
  3. Architecture Overview
  4. Infra SSOT

FractalOps is the organization meta-control plane.

The stable product path is:

organization meta-control plane -> onboarding -> work -> proposal -> proof -> reflective improvement

Everything in this repository must reinforce that path instead of inventing a competing control surface.

Use Execution Naming Glossary before adding another runtime noun. The platform/API control noun is runtime asset; AgentSquad prose should prefer execution surface, execution workspace, agent process adapter, and deploy image.

  • runtime asset
    • a typed machine surface selected by asset_id or role
  • runtime asset control
    • the only allowed application-layer contract for command execution and HTTP control
  • public URL
    • human-facing route, browser-safe, often edge-protected
  • executor URL
    • machine-facing control path, often internal or origin-only

Recent SSOT boundaries:

  • runtime_ssot
    • shared public, internal, and local runtime URL defaults
  • principal_defaults
    • shared CLI/system/audit principal defaults
  • internal/local URL SSOT
    • do not re-embed raw 127.0.0.1, raw cluster service URLs, or one-off fallback URLs in new code unless the SSOT layer is being extended

Current first-class runtime asset kinds:

  • lxc
  • kubernetes

FractalOps code should not treat raw vmid, raw namespace, provider keyword heuristics, or connector-local node mappings as canonical meaning.

  • Portal
    • primary human workflow surface
  • api / worker / execution-runtime
    • FractalOps-owned runtime components
  • Proposal Plane
    • only allowed non-read mutation gate
  • Semantics
    • ontology, lineage, and graph truth
  • DataHub
    • catalog and lineage accumulation plane fed by project RDF steward agents; not identity, SCIM/JIT, mutation, or proof authority
  • ClickHouse warehouse
    • warehouse fact/proof plane that stores the same steward lifecycle as queryable events/facts
  • PostHog / OpenTelemetry
    • distributed product/runtime event sources; classify into global ontology ids before ClickHouse accumulation
  • PlaywrightGrid bug replay
    • turns PostHog session/event plus OpenTelemetry trace id into a replay proof with Chronicle/WORM/Supabase Storage artifact refs or owned-domain public evidence URLs
  • GlitchTip
    • Sentry-compatible error/performance tracking plane; per-project DSN auto-provisioned by project_factory, public ingest, official MCP in the armory; not a metrics, ontology, or proof authority
  • Chronicle evidence
    • long-term proof and provenance
  • OpenBao
    • secret source of truth
  • CUE/Helm GitOps
    • infrastructure reconciliation authority for namespace policy, resource limits, placement, storage, and runtime metadata
    • routes, authorization policy, ExternalSecret delivery, HPA/autoscaling, feature-plane runtime, and service inventory stay in dedicated charts/controllers
    • not the durable job runner, not agent graph state, and not workspace file sync
  • Temporal
    • durable job scheduling, retry, activity execution, and recurring schedules
    • not agent graph state, not session truth, not workspace mutation
  • LangGraph agent server
    • agent graph fanout, route frontier, checkpoint/thread continuity, and HITL graph state
    • not the durable job queue and not a replacement for Temporal
  • runtime assets
    • concrete execution surfaces used by application services
  • Studio
    • shared agent execution boundary for run/session/report/activity state
    • owns session control state, not Temporal scheduling or LangGraph checkpoint storage
  • Ouroboros
    • FractalOps self-improvement workflow that runs on Studio and reports only to FractalOps
  • Armory
    • MCP/tool-pack composition boundary for agent initialization, PlaywrightGrid isolation, and future tool families

Execution Substrate vs Integration Endpoints

Section titled “Execution Substrate vs Integration Endpoints”

Use this distinction before adding new control logic.

Strongly-owned execution substrate:

  • portal
  • api
  • worker
  • execution-runtime
  • Temporal
  • DB / Hasura / Supabase Realtime
  • Daytona
  • PlaywrightGrid

Ordinary integration endpoints:

  • Nexus
  • Penpot
  • Dokploy
  • Headlamp
  • many connector targets

Rule:

  • if FractalOps only needs URL, auth, and readiness, treat the system as an integration endpoint
  • only use runtime-asset control when FractalOps truly owns the machine execution boundary

LangBoard is a company-owned first-class solution, but it is still an extension surface rather than a truth owner.

Canonical role:

  • LangBoard lifecycle surface
  • knowledge wiki surface
  • bot automation surface

Non-canonical role:

  • identity truth owner
  • proposal authority
  • proof authority

Practical rule:

  • FractalOps owns projection, lineage, and proof links
  • LangBoard owns project-native board, wiki, access, and bot execution behavior
  • machine control uses executor URL, not public Cloudflare paths

Use these entry points first.

Terminal window
fractalops runtime-assets list
fractalops runtime-assets show --role daytona_runtime
fractalops runtime-assets check --asset-id <asset-id>
fractalops runtime-assets run --role daytona_runtime -- python -V
fractalops runtime-assets request --role langboard_executor --method GET --path /health
Terminal window
make infra-validate
make infra-generate-env
make infra-env-check
make infra-apply-ssot
Terminal window
fractalops projects langboard-projection --project-slug <slug> --admin --subject-key <subject>
fractalops projects langboard-sync-plan --project-slug <slug> --admin --subject-key <subject>
fractalops projects langboard-sync --project-slug <slug> --admin --subject-key <subject> --mode scim
fractalops projects langboard-sync --project-slug <slug> --admin --subject-key <subject> --mode access
fractalops projects langboard-sync --project-slug <slug> --admin --subject-key <subject> --mode surface
  • owned machine execution must go through RuntimeAssetControlService
  • ordinary endpoint integrations should prefer typed URL/auth contracts and foundation HTTP clients
  • business operations must not call raw qm guest exec, kubectl, or urllib directly
  • synchronous outbound HTTP should use the foundation HttpClient unless runtime asset control owns the machine boundary
  • local process capture/spawn should go through fractalops.foundation.process_exec, not ad-hoc subprocess.run / subprocess.Popen
  • topology is authoritative only when expressed as typed runtime assets
  • orchestration engines must keep their lane:
    • CUE/Helm GitOps: infrastructure desired-state reconciliation only
    • Temporal: durable jobs only
    • LangGraph: agent graph continuity only
    • Studio: run/session/control truth only
    • operator CLIs: live observation/recovery drivers only
    • platform CI image build: GitOps-pinned runtime-image release only (no per-project or in-sandbox build plane)
  • shell scripts are wrappers or adapters, not doctrine
  • public URL, executor URL, and local/internal URLs must stay separate when edge protection or browser routing would corrupt machine control
  • live operator/minimap truth is portal_live_events -> harness-projection
  • Ouroboros public continuity is fresh | resume
  • project agent squads and Ouroboros share Studio primitives, but their reporting targets differ: project squads report to their bound project issue surface, Ouroboros reports to FractalOps
  • Armory configuration must be runtime/tool initialization, not prompt-only convention
  • managed README blocks should sync from .agent-os/config/managed-blocks.tsv through the todo skill scripts, not hand-edited copies

These areas are still implementation detail and may keep adapter-specific transport code as long as the application contract stays clean:

  • platform/k8s/* wrapper scripts
  • ops/lxc/* utility scripts
  • controller internals under runtime asset control adapters

These areas are not allowed to regress:

  • backend/src/fractalops/contexts/*/application
  • backend/src/fractalops/cli.py
  • portal API/read-model naming
  • topology env generation contract

The repository is cleaner now, but a few files still carry transport-heavy implementation detail and should be treated as the next refactor queue rather than canonical design examples.

  • backend/src/fractalops/contexts/access/application/integration/runtime_asset_control.py
    • canonical adapter boundary; raw qm guest exec remains here by design
  • backend/src/fractalops/contexts/access/application/integration/native_operations.py
    • now acts mostly as orchestration shell + patch seam
    • keep new capability logic in native_ops_* modules, not back in this file
  • backend/src/fractalops/contexts/access/application/integration/native_ops_langboard.py
    • owns LangBoard planners and sync executors
  • backend/src/fractalops/contexts/access/application/integration/native_ops_langboard_support.py
    • owns LangBoard API/metadata/status helpers; preserve native_operations wrappers for test patch seams only
  • backend/src/fractalops/contexts/access/application/integration/native_ops_dokploy_support.py
    • owns Dokploy API/ensure/transport helpers; preserve native_operations wrappers for patch seams only
  • backend/src/fractalops/contexts/access/application/integration/native_ops_project_support.py
    • owns project/env helpers; preserve native_operations wrappers where tests patch shared resolvers
  • backend/src/fractalops/contexts/access/application/integration/native_ops_runtime_support.py
    • owns ssh/qemu/keycloak-idp/connector-ssot runtime helpers; preserve native_operations wrappers for patch seams and shared resolver injection
  • backend/src/fractalops/contexts/harness_runtime/
    • LangGraph Harness Runtime is the only reusable agent execution boundary; retired Execution bridge modules must not be restored.
  • backend/src/fractalops/contexts/access/application/integration/studio_runner.py
    • browser executor probe is now standardized on the official MCP Python SDK helper
    • avoid hand-writing JSON-RPC initialize / tools/list; use mcp_executor_probe.py
  • backend/src/fractalops/contexts/access/application/studio/scenario_policy.py
    • still contains a small direct subprocess seam

Before handing off to the next engineer, confirm:

  1. The requested flow is named in canonical terms, not stack-local slang.
  2. The execution surface is selected by runtime asset asset_id or role.
  3. Secrets come from OpenBao or explicit env overlay, not lab fallback files.
  4. Public URLs and executor URLs are not mixed.
  5. LangBoard changes are split between:
    • generic upstream-safe product behavior
    • FractalOps-specific projection and sync logic
  6. New docs route back to the constitution and canonical architecture instead of creating a parallel doctrine.
Terminal window
make test-unit
make test-contract
make test-integration
make codegen-check
git diff --check

Portal checks live in yamonco/fractalops-frontend:

Terminal window
pnpm run lint:frontend:ci
pnpm run portal:check
pnpm run portal:build

Validation policy:

  • test-unit uses the quiet fail-only unittest_runner
  • test-contract uses the curated Schemathesis contract suite
  • test-integration uses shared runtime smoke only
  • shared service 5xx should be recorded as skip + report, not bypassed with local Docker fallback

For focused runtime asset validation:

Terminal window
PYTHONPATH=backend/src .tools/uv/uv run --group dev python -m fractalops.testing.unittest_runner \
tests/test_runtime_asset_ops_normalization.py \
tests/test_generate_env.py \
tests/test_cli_operations.py \
tests/test_operation_fabric.py