AgentSquad Runtime — Current Implementation Map

Snapshot of the current code (not the target architecture). Scope: the studio project_delivery / agentsquad template path. File:line citations are evidence; verify before relying on them — code moves.

TL;DR — two planes, two “graphs”, one live dispatcher

Temporal = the durable when. It triggers run execution and runs the periodic heartbeat sweep that ticks every live run. It does not itself step agents.
The control-plane tick = the live how. run_dispatch.execute_run() is the real reconcile loop: it launches each agent session into a Daytona workspace, enforces the first-tool gate, observes runner completion, revives mailbox handoffs, and closes the run. This is the path the live squad actually runs on (it is where waiting_first_tool, dispatch_slot_prompt, mailbox revive, etc. live).
LangGraph appears in two distinct, narrower roles — and neither is “the squad brain”:
1. a compiled StateGraph in harness_runtime that models one execution wave (intake → fanout → collect → reconcile → decide), run via a remote LangGraph Agent Server; and
2. spec/contract-only metadata (agent_graph/contract.py, langgraph_prompt_contract.py) embedded in the desired spec and prompt shape. The agent reasoning and the handoff decisions live in neither graph.

The single biggest architectural finding: two overlapping orchestration representations (the LangGraph harness wave-graph vs. the control-plane execute_run tick) model the same “dispatch → observe → reconcile → decide-next” cycle. The live project-delivery path is the control-plane tick; the harness LangGraph graph’s live wiring for project-delivery runs is unclear and is the prime consolidation target.

Temporal involvement

backend/src/fractalops/contexts/orchestration/infrastructure/temporal_workflows.py defines 8 workflows over a shared _WorkflowPhaseState base:

Workflow	Owns
`ProposalMutationWorkflow`	proposal-bound mutations
`ReviewDecisionWorkflow`	approval/review (signal-driven state)
`RuntimeAssetOperationWorkflow`	node wake/sleep, wizard tries
`ProjectDeletionWorkflow`	deferred teardown (purge-delay sleep)
`ConnectorSyncWorkflow`	identity reconcile / access sync
`ProofClosureWorkflow`	dataset proof closure (scheduled)
`StudioRunWorkflow`	AgentSquad run execution (dispatch/observe trigger)
`ScheduledReconciliationWorkflow`	periodic background jobs

Squad trigger path (Temporal triggers; the activity does the work inline):

HTTP → StudioRuntimeCommandService.queue_run_execution()
     → start_studio_run(kind=STUDIO_RUN_EXECUTE, run_id)         temporal_client.py:138
     → start_typed_workflow(STUDIO_RUN, …)  [async fire-and-forget]  temporal_client.py:57
     → StudioRunWorkflow.run()                                    temporal_workflows.py:304
     → studio_run_activity()                                      temporal_workflows.py:137
     → run_studio_run_temporal() → _run_default_studio_run_execute(run_id)
       → runtime_command_service.execute_pending_run_execution(run_id)   [INLINE dispatch/observe]

Heartbeat sweep: a Temporal schedule fires StudioRunWorkflow(STUDIO_RUN_EXECUTE) with no run_id every temporal_studio_interval_seconds; the activity sweeps every run whose status ∈ {admitted, launching, running, blocked} and queues+executes each inline (orchestration_scheduled_reconciliation_handlers.py:221-277).

“API queues; Temporal executes” holds, with a nuance: for squad runs the dispatch/observe cycle runs synchronously inside the activity — no per-session sub-workflows.

temporal_enabled=false (lab/embedded): no inline fallback. The worker sleeps (temporal_worker.py:121), but any start_studio_run() raises RuntimeError("Temporal is disabled") (temporal_client.py:63). So squad dispatch is hard-coupled to Temporal being on.

LangGraph involvement

(1) Harness wave-graph (actually compiled & executable). harness_runtime/application/graph.py:459 compiles a StateGraph over HarnessGraphState: intake_run → bind_armory → allocate_workspace → route_frontier → dispatch_agents (Send() fanout to run_agent_attempt) → collect → reconcile_inbox → decide_next (Command) → publish_state. It is invoked via LangGraphAgentServerGateway (the langgraph_sdk client → a remote LangGraph Agent Server; gateway.py:37,58-93) from HarnessRuntimeService.execute_studio_run() (runtime_service.py:168). run_agent_attempt only queues a session to the runtime — it does not run agent reasoning, and handoffs are explicitly not fed through graph edges (graph.py:223-229: “the mailbox lives in the dispatch plane”).

(2) Spec/contract-only (not executed). orchestration/domain/agent_graph/contract.py:agent_graph_role_contract() emits per-role LangGraph identity metadata (thread_owner="langgraph", assistant/skill/MCP bindings) that agentsquad_desired_spec.py:88-96 stamps into the desired state. langgraph_prompt_contract.py is a message-shape adapter so agent state matches the LangGraph/LangChain prompt convention.

(3) portal_copilot/application/graph.py — a separate 2-node copilot graph, unrelated to squad execution.

Net: LangGraph is not the squad’s decision engine. It (a) optionally models one concurrency wave and (b) supplies contract/prompt shapes. The live per-tick dispatch is the control-plane loop below.

Run lifecycle (the live control-plane path)

Create (POST /v1/admin/studio/runs):

create_run()                                   studio_control_plane_service.py:353
 → run_lifecycle.create_run() → _create_run_with_preflight()   run_lifecycle.py:260,338
   → desired_agent_squad_spec(); provision_agent_roster_identities()
   → repo.create_run(); _create_seed_sessions() → create_seed_sessions()   features/lifecycle/session_seed.py
 → runtime_command_service.queue_run_execution()  (→ Temporal STUDIO_RUN_EXECUTE)

Tick (run_dispatch.execute_run(), ~per heartbeat) — for each session:

infer runner paths + real first tool from events
revive terminal sessions with pending mailbox (_revive_terminal_mailbox_session)
clear stale walls when source evidence is observed (dispatch_wall_fallback)
first-tool deadline (run_dispatch.py:752-817): if waiting_first_tool and elapsed ≥ FIRST_TOOL_EVIDENCE_DEADLINE_SECONDS (120s, clock resets on each liveness signal) → wall agent_first_tool_required / source_evidence_wall
release terminal slots; 6. observe runner completion (observe_daytona_toolbox_slot_session → apply_execution_runner_result); 7. launch ready sessions → dispatch_run_session_prompt → run_daytona_toolbox_slot_session (daytona_runner)
close_run_when_all_sessions_terminal()

First-tool gate (project_delivery_metadata.py): satisfied by fractalops_workspace_*, mcp__*, claude-native (Bash/Write/Edit/Read/Glob/Grep), or codex (codex:shell:*, codex:*). (Runtime-agnostic — codex coders clear it with any codex: call.)

Report → handoff → mailbox → next launch:

report_recorder.record_report()                report_recorder.py:146
 → _record_report_handoff() → StudioReportHandoffRecorder.record_handoff()   report_handoff.py:223
   → per target: _find_agent_session() | _spawn_handoff_target_session()→ensure_handoff_target_session()
   → build_handoff_payload() → create_or_refresh_pending_mailbox_entry()
   → queue_agent_mailbox_delivery() → session.status="ready", pendingMailboxDelivery=True
 → next execute_run tick revives the target with the mailbox payload

Completion → PR: project_delivery_completion_requires_pr() (project_delivery_completion_policy.py:303) blocks backend/frontend/contract completion that lacks pr_url+branch_name; run_result_projection.py then auto-hands-off to committer/tester.

Subflows

Subflow	Where
Coordination-completion handoff (planner/curator → next)	`run_result_projection._dispatch_completion_handoff`
Source-ref delivery handoff (coder → committer/tester)	`run_result_projection._dispatch_source_ref_handoff`
Mailbox delivery / revive	`features/team/mailbox.py` + `run_dispatch.py:498`
Wall fallback (retry / skip / workspace reset)	`features/execution/wall_fallback_contract.py`
Replay (fresh retry of terminal session)	`features/runtime/run_replay.py`
Session intervention (pause/resume/override/switch_runtime)	`features/runtime/session_intervention.py`
Heartbeat / liveness	`features/runtime/agent_heartbeat.py`
Cleanup (release all slots)	`features/execution/run_closure.py`

Duplicate / hop-only patterns to consolidate

Ranked by payoff:

Plan→execute split between the LangGraph graph and the control-plane tick. CORRECTION (deeper read): these are NOT two parallel engines. execute_pending_run_execution (pending_run.py) runs them as a pipeline: the LangGraph graph (execute_studio_run) is the PLANNER — it picks the frontier/wave (which sessions launch this turn, concurrency, armory bind) and returns launched_sessions; then the control-plane tick (service.execute_run(session_ids= <graph's chosen sessions>)) is the EXECUTOR — it launches exactly those into daytona and runs the lifecycle (first-tool gate, observe-runner, close, mailbox-revive). LangGraph is therefore ALREADY the orchestration brain. The real duplication is that the tick (run_dispatch ~1850L + run_result_projection ~1960L) also holds orchestration DECISION logic (frontier/lifecycle/close/ handoff-routing) that overlaps the graph’s. P2 = move that decision logic INTO graph nodes and leave the tick a pure executor (daytona I/O only) — the graph’s run_agent_attempt currently emits exec-cell metadata the tick re-derives; unify on the graph as the single decision authority. DECIDED (operator, 2026-06): LangGraph is the standard orchestration engine. Rationale: beyond coding agents, the squad integrates into the broader LangChain/LangGraph ecosystem later — so the orchestration brain must be a LangGraph StateGraph, not a bespoke tick loop. Target shape (see “Target: LangGraph as the standard engine” below): the control-plane tick’s responsibilities (first-tool gate, observe-runner, launch, close, mailbox revive) fold into LangGraph nodes; the daytona workspace + mailbox become the substrate the graph drives, not a parallel engine; runtime execution is a clean AgentRuntime port with claude/codex/antigravity adapters. Migrate incrementally and non-breakingly; do not rip out the tick before the graph path is proven end-to-end.
First-tool wall decision logic split across run_dispatch.py and run_result_projection.py (_silence_window_origin, _source_evidence_first_tool_observed) — one gate, two homes.
Execution-slot metadata merge duplicated: execution_slot_prompt_dispatch_metadata() (studio_run_execution_command.py:631) vs. merged_execution_slot_metadata() (run_dispatch.py:577).
Mailbox-revive metadata duplicated: mailbox.py vs. run_result_projection._fresh_mailbox_revive_metadata.
DI-by-parameter wiring across native_ops_* + the native_operations.py facade (412 kwarg-forward lines + ~72 Callable params) — being collapsed; see [[squad-provisioning-audit]] family work.
Thin delegation hops: studio_control_plane_service (_create_and_execute_run, _launch_prompt, _preflight, _assert_preflight), run_lifecycle (5× _*_project_squad_*), Temporal enqueue_workflow (backend.py:11) + request_background_studio_* (background_workflow.py) — collapse to direct calls/aliases where they add no logic.

Target: LangGraph as the standard engine

Decision (operator, 2026-06): the AgentSquad orchestration brain is a LangGraph StateGraph. Everything else is a clean port/adapter around it so the squad slots into the LangChain/LangGraph ecosystem (checkpointers, LangSmith, subgraphs, human-in-the-loop) without a rewrite.

Three layers, sharp boundaries:

Engine (LangGraph StateGraph) — the single source of orchestration truth: intake → bind armory → allocate workspace → route frontier → dispatch → collect → reconcile inbox → decide-next → publish. The control-plane tick (run_dispatch.execute_run) is absorbed into graph nodes (first-tool gate, observe-runner, launch, close, mailbox revive become node logic or tool-calls the graph makes). One engine — the bespoke tick stops being a second brain. State lives in HarnessGraphState; persistence via a LangGraph checkpointer (so a run is resumable/inspectable through the LangGraph runtime, not a side DB tick).
Runtime port (AgentRuntime) — a narrow interface the dispatch/run_agent_attempt node calls to actually execute one agent turn, with adapters: ClaudeRuntime, CodexRuntime, AntigravityRuntime. Each adapter owns its CLI/transport specifics — MCP wiring, hooks, skills config, auth/base-url (e.g. cliproxy bridge vs native) — behind the same port. No codex-only shims leak into the engine (closes the “codex agents not reaching cliproxy” class of bug: that becomes one adapter’s contract, testable in isolation). See [[runtime-abstraction-not-codex-only]].
Substrate — daytona workspace (git worktrees), the mailbox handoff bus, and Temporal as the durable trigger/heartbeat for the engine. These are things the engine drives, not alternative orchestrators. Mailbox stays the handoff transport; the graph reads it in reconcile_inbox (handoffs are data, not graph edges).

Migration is incremental and non-breaking: (P1) extract the AgentRuntime port + adapters and route the current dispatch through it; (P2) fold the tick’s gate/observe/launch/close into graph nodes and make the LangGraph engine the live driver for project-delivery; (P3) prove a run end-to-end on the engine, then retire the parallel tick. Never delete the tick before the graph path is proven live.