Skip to content

AgentSquad Runtime — Current Implementation Map

Snapshot of the current code (not the target architecture). Scope: the studio project_delivery / agentsquad template path. File:line citations are evidence; verify before relying on them — code moves.

TL;DR — two planes, two “graphs”, one live dispatcher

Section titled “TL;DR — two planes, two “graphs”, one live dispatcher”
  • Temporal = the durable when. It triggers run execution and runs the periodic heartbeat sweep that ticks every live run. It does not itself step agents.
  • The control-plane tick = the live how. run_dispatch.execute_run() is the real reconcile loop: it launches each agent session into a Daytona workspace, enforces the first-tool gate, observes runner completion, revives mailbox handoffs, and closes the run. This is the path the live squad actually runs on (it is where waiting_first_tool, dispatch_slot_prompt, mailbox revive, etc. live).
  • LangGraph appears in two distinct, narrower roles — and neither is “the squad brain”:
    1. a compiled StateGraph in harness_runtime that models one execution wave (intake → fanout → collect → reconcile → decide), run via a remote LangGraph Agent Server; and
    2. spec/contract-only metadata (agent_graph/contract.py, langgraph_prompt_contract.py) embedded in the desired spec and prompt shape. The agent reasoning and the handoff decisions live in neither graph.

The single biggest architectural finding: two overlapping orchestration representations (the LangGraph harness wave-graph vs. the control-plane execute_run tick) model the same “dispatch → observe → reconcile → decide-next” cycle. The live project-delivery path is the control-plane tick; the harness LangGraph graph’s live wiring for project-delivery runs is unclear and is the prime consolidation target.

backend/src/fractalops/contexts/orchestration/infrastructure/temporal_workflows.py defines 8 workflows over a shared _WorkflowPhaseState base:

WorkflowOwns
ProposalMutationWorkflowproposal-bound mutations
ReviewDecisionWorkflowapproval/review (signal-driven state)
RuntimeAssetOperationWorkflownode wake/sleep, wizard tries
ProjectDeletionWorkflowdeferred teardown (purge-delay sleep)
ConnectorSyncWorkflowidentity reconcile / access sync
ProofClosureWorkflowdataset proof closure (scheduled)
StudioRunWorkflowAgentSquad run execution (dispatch/observe trigger)
ScheduledReconciliationWorkflowperiodic background jobs

Squad trigger path (Temporal triggers; the activity does the work inline):

HTTP → StudioRuntimeCommandService.queue_run_execution()
→ start_studio_run(kind=STUDIO_RUN_EXECUTE, run_id) temporal_client.py:138
→ start_typed_workflow(STUDIO_RUN, …) [async fire-and-forget] temporal_client.py:57
→ StudioRunWorkflow.run() temporal_workflows.py:304
→ studio_run_activity() temporal_workflows.py:137
→ run_studio_run_temporal() → _run_default_studio_run_execute(run_id)
→ runtime_command_service.execute_pending_run_execution(run_id) [INLINE dispatch/observe]

Heartbeat sweep: a Temporal schedule fires StudioRunWorkflow(STUDIO_RUN_EXECUTE) with no run_id every temporal_studio_interval_seconds; the activity sweeps every run whose status ∈ {admitted, launching, running, blocked} and queues+executes each inline (orchestration_scheduled_reconciliation_handlers.py:221-277).

“API queues; Temporal executes” holds, with a nuance: for squad runs the dispatch/observe cycle runs synchronously inside the activity — no per-session sub-workflows.

temporal_enabled=false (lab/embedded): no inline fallback. The worker sleeps (temporal_worker.py:121), but any start_studio_run() raises RuntimeError("Temporal is disabled") (temporal_client.py:63). So squad dispatch is hard-coupled to Temporal being on.

(1) Harness wave-graph (actually compiled & executable). harness_runtime/application/graph.py:459 compiles a StateGraph over HarnessGraphState: intake_run → bind_armory → allocate_workspace → route_frontier → dispatch_agents (Send() fanout to run_agent_attempt) → collect → reconcile_inbox → decide_next (Command) → publish_state. It is invoked via LangGraphAgentServerGateway (the langgraph_sdk client → a remote LangGraph Agent Server; gateway.py:37,58-93) from HarnessRuntimeService.execute_studio_run() (runtime_service.py:168). run_agent_attempt only queues a session to the runtime — it does not run agent reasoning, and handoffs are explicitly not fed through graph edges (graph.py:223-229: “the mailbox lives in the dispatch plane”).

(2) Spec/contract-only (not executed). orchestration/domain/agent_graph/contract.py:agent_graph_role_contract() emits per-role LangGraph identity metadata (thread_owner="langgraph", assistant/skill/MCP bindings) that agentsquad_desired_spec.py:88-96 stamps into the desired state. langgraph_prompt_contract.py is a message-shape adapter so agent state matches the LangGraph/LangChain prompt convention.

(3) portal_copilot/application/graph.py — a separate 2-node copilot graph, unrelated to squad execution.

Net: LangGraph is not the squad’s decision engine. It (a) optionally models one concurrency wave and (b) supplies contract/prompt shapes. The live per-tick dispatch is the control-plane loop below.

Run lifecycle (the live control-plane path)

Section titled “Run lifecycle (the live control-plane path)”

Create (POST /v1/admin/studio/runs):

create_run() studio_control_plane_service.py:353
→ run_lifecycle.create_run() → _create_run_with_preflight() run_lifecycle.py:260,338
→ desired_agent_squad_spec(); provision_agent_roster_identities()
→ repo.create_run(); _create_seed_sessions() → create_seed_sessions() features/lifecycle/session_seed.py
→ runtime_command_service.queue_run_execution() (→ Temporal STUDIO_RUN_EXECUTE)

Tick (run_dispatch.execute_run(), ~per heartbeat) — for each session:

  1. infer runner paths + real first tool from events
  2. revive terminal sessions with pending mailbox (_revive_terminal_mailbox_session)
  3. clear stale walls when source evidence is observed (dispatch_wall_fallback)
  4. first-tool deadline (run_dispatch.py:752-817): if waiting_first_tool and elapsed ≥ FIRST_TOOL_EVIDENCE_DEADLINE_SECONDS (120s, clock resets on each liveness signal) → wall agent_first_tool_required / source_evidence_wall
  5. release terminal slots; 6. observe runner completion (observe_daytona_toolbox_slot_sessionapply_execution_runner_result); 7. launch ready sessions → dispatch_run_session_promptrun_daytona_toolbox_slot_session (daytona_runner)
  6. close_run_when_all_sessions_terminal()

First-tool gate (project_delivery_metadata.py): satisfied by fractalops_workspace_*, mcp__*, claude-native (Bash/Write/Edit/Read/Glob/Grep), or codex (codex:shell:*, codex:*). (Runtime-agnostic — codex coders clear it with any codex: call.)

Report → handoff → mailbox → next launch:

report_recorder.record_report() report_recorder.py:146
→ _record_report_handoff() → StudioReportHandoffRecorder.record_handoff() report_handoff.py:223
→ per target: _find_agent_session() | _spawn_handoff_target_session()→ensure_handoff_target_session()
→ build_handoff_payload() → create_or_refresh_pending_mailbox_entry()
→ queue_agent_mailbox_delivery() → session.status="ready", pendingMailboxDelivery=True
→ next execute_run tick revives the target with the mailbox payload

Completion → PR: project_delivery_completion_requires_pr() (project_delivery_completion_policy.py:303) blocks backend/frontend/contract completion that lacks pr_url+branch_name; run_result_projection.py then auto-hands-off to committer/tester.

SubflowWhere
Coordination-completion handoff (planner/curator → next)run_result_projection._dispatch_completion_handoff
Source-ref delivery handoff (coder → committer/tester)run_result_projection._dispatch_source_ref_handoff
Mailbox delivery / revivefeatures/team/mailbox.py + run_dispatch.py:498
Wall fallback (retry / skip / workspace reset)features/execution/wall_fallback_contract.py
Replay (fresh retry of terminal session)features/runtime/run_replay.py
Session intervention (pause/resume/override/switch_runtime)features/runtime/session_intervention.py
Heartbeat / livenessfeatures/runtime/agent_heartbeat.py
Cleanup (release all slots)features/execution/run_closure.py

Duplicate / hop-only patterns to consolidate

Section titled “Duplicate / hop-only patterns to consolidate”

Ranked by payoff:

  1. Plan→execute split between the LangGraph graph and the control-plane tick. CORRECTION (deeper read): these are NOT two parallel engines. execute_pending_run_execution (pending_run.py) runs them as a pipeline: the LangGraph graph (execute_studio_run) is the PLANNER — it picks the frontier/wave (which sessions launch this turn, concurrency, armory bind) and returns launched_sessions; then the control-plane tick (service.execute_run(session_ids= <graph's chosen sessions>)) is the EXECUTOR — it launches exactly those into daytona and runs the lifecycle (first-tool gate, observe-runner, close, mailbox-revive). LangGraph is therefore ALREADY the orchestration brain. The real duplication is that the tick (run_dispatch ~1850L + run_result_projection ~1960L) also holds orchestration DECISION logic (frontier/lifecycle/close/ handoff-routing) that overlaps the graph’s. P2 = move that decision logic INTO graph nodes and leave the tick a pure executor (daytona I/O only) — the graph’s run_agent_attempt currently emits exec-cell metadata the tick re-derives; unify on the graph as the single decision authority. DECIDED (operator, 2026-06): LangGraph is the standard orchestration engine. Rationale: beyond coding agents, the squad integrates into the broader LangChain/LangGraph ecosystem later — so the orchestration brain must be a LangGraph StateGraph, not a bespoke tick loop. Target shape (see “Target: LangGraph as the standard engine” below): the control-plane tick’s responsibilities (first-tool gate, observe-runner, launch, close, mailbox revive) fold into LangGraph nodes; the daytona workspace + mailbox become the substrate the graph drives, not a parallel engine; runtime execution is a clean AgentRuntime port with claude/codex/antigravity adapters. Migrate incrementally and non-breakingly; do not rip out the tick before the graph path is proven end-to-end.
  2. First-tool wall decision logic split across run_dispatch.py and run_result_projection.py (_silence_window_origin, _source_evidence_first_tool_observed) — one gate, two homes.
  3. Execution-slot metadata merge duplicated: execution_slot_prompt_dispatch_metadata() (studio_run_execution_command.py:631) vs. merged_execution_slot_metadata() (run_dispatch.py:577).
  4. Mailbox-revive metadata duplicated: mailbox.py vs. run_result_projection._fresh_mailbox_revive_metadata.
  5. DI-by-parameter wiring across native_ops_* + the native_operations.py facade (412 kwarg-forward lines + ~72 Callable params) — being collapsed; see [[squad-provisioning-audit]] family work.
  6. Thin delegation hops: studio_control_plane_service (_create_and_execute_run, _launch_prompt, _preflight, _assert_preflight), run_lifecycle (5× _*_project_squad_*), Temporal enqueue_workflow (backend.py:11) + request_background_studio_* (background_workflow.py) — collapse to direct calls/aliases where they add no logic.

Decision (operator, 2026-06): the AgentSquad orchestration brain is a LangGraph StateGraph. Everything else is a clean port/adapter around it so the squad slots into the LangChain/LangGraph ecosystem (checkpointers, LangSmith, subgraphs, human-in-the-loop) without a rewrite.

Three layers, sharp boundaries:

  1. Engine (LangGraph StateGraph) — the single source of orchestration truth: intake → bind armory → allocate workspace → route frontier → dispatch → collect → reconcile inbox → decide-next → publish. The control-plane tick (run_dispatch.execute_run) is absorbed into graph nodes (first-tool gate, observe-runner, launch, close, mailbox revive become node logic or tool-calls the graph makes). One engine — the bespoke tick stops being a second brain. State lives in HarnessGraphState; persistence via a LangGraph checkpointer (so a run is resumable/inspectable through the LangGraph runtime, not a side DB tick).

  2. Runtime port (AgentRuntime) — a narrow interface the dispatch/run_agent_attempt node calls to actually execute one agent turn, with adapters: ClaudeRuntime, CodexRuntime, AntigravityRuntime. Each adapter owns its CLI/transport specifics — MCP wiring, hooks, skills config, auth/base-url (e.g. cliproxy bridge vs native) — behind the same port. No codex-only shims leak into the engine (closes the “codex agents not reaching cliproxy” class of bug: that becomes one adapter’s contract, testable in isolation). See [[runtime-abstraction-not-codex-only]].

  3. Substrate — daytona workspace (git worktrees), the mailbox handoff bus, and Temporal as the durable trigger/heartbeat for the engine. These are things the engine drives, not alternative orchestrators. Mailbox stays the handoff transport; the graph reads it in reconcile_inbox (handoffs are data, not graph edges).

Migration is incremental and non-breaking: (P1) extract the AgentRuntime port + adapters and route the current dispatch through it; (P2) fold the tick’s gate/observe/launch/close into graph nodes and make the LangGraph engine the live driver for project-delivery; (P3) prove a run end-to-end on the engine, then retire the parallel tick. Never delete the tick before the graph path is proven live.