Skip to content

Harness Runtime

Harness Runtime is the LangGraph Agent Server-backed orchestration kernel for FractalOps agent execution.

It is not Studio, Ouroboros, Armory, CodexGate, or a browser plane. It owns the execution graph that coordinates those surfaces.

  • Harness Runtime owns run state, graph transitions, checkpoint identity, agent process adapter selection, and replay boundaries.
  • Studio owns operator UX, templates, run history, and live read models.
  • Ouroboros is the FractalOps self-improvement graph template.
  • Agent Squad is a project-delivery graph template.
  • Armory owns tool, skill, MCP, secret, and browser-slot loadout composition.
  • Temporal owns durable queueing, activity retry, timers, and workspace/source/process lifecycle deadlines.
  • LangGraph Agent Server owns thread state, graph transitions, fan-out, frontier routing, and resume inside the current Temporal activity.

runtime must not absorb workspace, provider, image, process, or file-transfer language:

  • use execution surface in AgentSquad prose for the provider surface; use runtime asset only at platform/API boundaries
  • use execution workspace for the per-agent Daytona/workspace lease
  • use workspace bootstrap asset for a file delivered before launch
  • use source ref bundle only for read-only reference/source archives
  • use agent process adapter for the CLI/browser/LangGraph executor that runs one attempt
  • use deploy image in AgentSquad prose for a deployable container artifact; use runtime image only when naming the compatibility field or platform release artifact
  • use Agent HUD for Studio’s compact operator projection

Daytona is the execution workspace provider, not lifecycle owner. Harness Runtime may use Daytona labels, auto-stop/archive/delete intervals, prepared snapshots, forks, and sandbox telemetry as provider controls. Temporal still owns deadlines, retry, and phase decisions. Never snapshot a dirty agent-owned worktree before PR evidence exists; snapshot only prepared source/tool states or post-delivery proof states.

Production payloads may still contain runtime_kind, runtimeKind, and BOARD_STATUS. Treat them as compatibility fields:

  • Studio runtime_kind means delivery graph kind.
  • Harness adapter runtimeKind means agent process adapter.
  • BOARD_STATUS means legacy AGENT_HUD_STATUS.

Every agent session is normalized into an agent process adapter spec. The current wire-compatible model name is AgentRuntimeSpec; treat it as a legacy payload name for the adapter spec, not as a second Harness Runtime.

  • langgraph-native: LangGraph node/subgraph agent for PM, triage, formatting, or deterministic coordination.
  • codex-cli: Codex CLI adapter.
  • claude-cli: Claude Code CLI adapter.
  • zai-cli: ZAI CLI adapter.
  • browser-only: PlaywrightGrid-centered browser agent.
  • workspace-cli: workspace-backed CLI agent. The orchestration graph can mix these adapters in one run. CLI agents stay first-class, but LangGraph coordinates them instead of being just another external runner.

cliproxy is not an agent process adapter. It is the credential gateway used by CLI adapters such as codex-cli and claude-cli when they need an OpenAI-compatible, Codex-compatible, or Claude-compatible proxy backed by account artifacts. Armory binds this as a credential artifact policy, not as another executor.

Production runs use the standalone LangGraph Agent Server shape, not langgraph dev.

  • the Agent Server image is built from ops/containers/fractalops-agent-server/Dockerfile
  • the API, worker, and CLI runtime image remains fractalops-api
  • DATABASE_URI is the Agent Server persistence store for assistants, threads, runs, checkpoints, and queue state
  • REDIS_URI is the Agent Server streaming and background-run broker
  • local .langgraph_api, SQLite, MemorySaver, and in-process checkpoint stores are invalid production paths

Agent Server logs and traces converge through FractalOps OpenTelemetry, not a separate LangSmith vendor path.

  • enable Agent Server APM with LS_APM_OTEL_ENABLED=true
  • enable LangChain/LangGraph application spans with LANGSMITH_OTEL_ENABLED=true and LANGSMITH_TRACING_MODE=otel
  • export to the in-cluster OTLP collector with standard OTEL_* variables
  • keep LANGSMITH_API_KEY out of the production secret unless explicitly switching to LangSmith-hosted trace storage
  • keep LOG_JSON=true so Kubernetes logs stay parseable by the shared collector/log pipeline

Production checkpoints are owned by LangGraph Agent Server. FractalOps does not compile graphs with MemorySaver, local SQLite, or hand-rolled checkpoint stores in the runtime path.

Canonical thread identity:

thread_id = studio_run_id
run_id = Agent Server execution attempt
launched session identity = studio agent_session_id
checkpoint namespace = agent/session/subgraph identity

The default run graph is:

intake_run
-> bind_armory
-> request_workspace_activity
-> dispatch_agent_attempts
-> run_agent_attempt fan-out using Send inside the active Temporal phase
-> collect_reports
-> decide_next using Command
-> publish_state

This graph is intentionally small. Runtime-specific complexity belongs behind Armory manifests and runtime adapters exposed to Agent Server, not in Studio prompts or Portal components.

Agent-to-agent communication is an inbox event inside the same Harness thread.

  • the sender writes an AgentInboxEntry
  • Studio enqueues a same-thread Agent Server run
  • the graph routes the pending inbox event before the next agent attempt
  • prompt prose is not the source of truth for handoff state

Use these names across Studio, AgentSquad, workspace shims, tests, and operator runbooks. The canonical glossary lives in Execution Naming Glossary; AgentSquad-specific delivery terms live in AgentSquad Delivery Language. Do not introduce near-synonyms in prompts or payloads.

  • execution surface: AgentSquad-facing name for the provider surface that can run work. It maps to runtime asset only at platform/API boundaries.
  • deploy image: AgentSquad-facing name for the container image released by CI. It maps to runtime image only at platform/release boundaries.
  • controller message: the operator or graph instruction carried into the next agent attempt. It may include a durable delivery reference.
  • delivery reference: a GitHub pull request or issue URL that proves where agent work converges. Local artifact paths are not delivery references.
  • missing delivery reference blocker: the generic blocker represented in runtime payloads as github_issue_or_pr_link_missing.
  • agent-owned worktree: the writable Git worktree for one agent lane.
  • assembly worktree: the shared read/observation checkout that lets agents inspect cross-repository context without taking Git ownership.
  • handoff: a structured inbox transfer to another agent. Handoff is not a substitute for a delivery reference when the lane is expected to close work.
  • merge lifecycle decision: the committer’s closed decision after inspecting linked PRs and tester proof. Runtime blocker merge_lifecycle_decision_required means delivery reference exists but canonical fields are missing: handoff_state=merge_requested with merge_gate_status=qa_passed, or handoff_state=merged.

New Studio and Ouroboros execution requests must record harnessRuntime metadata and enter through the Harness Runtime Agent Server gateway. Legacy Execution/Kanban runtime branches and local session probes are not valid execution paths.