AgentSquad Runtime — Current Implementation Map
Snapshot of the current code (not the target architecture). Scope: the studio
project_delivery/agentsquadtemplate path. File:line citations are evidence; verify before relying on them — code moves.
TL;DR — two planes, two “graphs”, one live dispatcher
Section titled “TL;DR — two planes, two “graphs”, one live dispatcher”- Temporal = the durable when. It triggers run execution and runs the periodic heartbeat sweep that ticks every live run. It does not itself step agents.
- The control-plane tick = the live how.
run_dispatch.execute_run()is the real reconcile loop: it launches each agent session into a Daytona workspace, enforces the first-tool gate, observes runner completion, revives mailbox handoffs, and closes the run. This is the path the live squad actually runs on (it is wherewaiting_first_tool,dispatch_slot_prompt, mailbox revive, etc. live). - LangGraph appears in two distinct, narrower roles — and neither is “the squad brain”:
- a compiled
StateGraphinharness_runtimethat models one execution wave (intake → fanout → collect → reconcile → decide), run via a remote LangGraph Agent Server; and - spec/contract-only metadata (
agent_graph/contract.py,langgraph_prompt_contract.py) embedded in the desired spec and prompt shape. The agent reasoning and the handoff decisions live in neither graph.
- a compiled
The single biggest architectural finding: two overlapping orchestration representations
(the LangGraph harness wave-graph vs. the control-plane execute_run tick) model the same
“dispatch → observe → reconcile → decide-next” cycle. The live project-delivery path is the
control-plane tick; the harness LangGraph graph’s live wiring for project-delivery runs is
unclear and is the prime consolidation target.
Temporal involvement
Section titled “Temporal involvement”backend/src/fractalops/contexts/orchestration/infrastructure/temporal_workflows.py defines
8 workflows over a shared _WorkflowPhaseState base:
| Workflow | Owns |
|---|---|
ProposalMutationWorkflow | proposal-bound mutations |
ReviewDecisionWorkflow | approval/review (signal-driven state) |
RuntimeAssetOperationWorkflow | node wake/sleep, wizard tries |
ProjectDeletionWorkflow | deferred teardown (purge-delay sleep) |
ConnectorSyncWorkflow | identity reconcile / access sync |
ProofClosureWorkflow | dataset proof closure (scheduled) |
StudioRunWorkflow | AgentSquad run execution (dispatch/observe trigger) |
ScheduledReconciliationWorkflow | periodic background jobs |
Squad trigger path (Temporal triggers; the activity does the work inline):
HTTP → StudioRuntimeCommandService.queue_run_execution() → start_studio_run(kind=STUDIO_RUN_EXECUTE, run_id) temporal_client.py:138 → start_typed_workflow(STUDIO_RUN, …) [async fire-and-forget] temporal_client.py:57 → StudioRunWorkflow.run() temporal_workflows.py:304 → studio_run_activity() temporal_workflows.py:137 → run_studio_run_temporal() → _run_default_studio_run_execute(run_id) → runtime_command_service.execute_pending_run_execution(run_id) [INLINE dispatch/observe]Heartbeat sweep: a Temporal schedule fires StudioRunWorkflow(STUDIO_RUN_EXECUTE) with
no run_id every temporal_studio_interval_seconds; the activity sweeps every run whose
status ∈ {admitted, launching, running, blocked} and queues+executes each inline
(orchestration_scheduled_reconciliation_handlers.py:221-277).
“API queues; Temporal executes” holds, with a nuance: for squad runs the dispatch/observe cycle runs synchronously inside the activity — no per-session sub-workflows.
temporal_enabled=false (lab/embedded): no inline fallback. The worker sleeps
(temporal_worker.py:121), but any start_studio_run() raises RuntimeError("Temporal is disabled") (temporal_client.py:63). So squad dispatch is hard-coupled to Temporal being on.
LangGraph involvement
Section titled “LangGraph involvement”(1) Harness wave-graph (actually compiled & executable).
harness_runtime/application/graph.py:459 compiles a StateGraph over HarnessGraphState:
intake_run → bind_armory → allocate_workspace → route_frontier → dispatch_agents
(Send() fanout to run_agent_attempt) → collect → reconcile_inbox → decide_next (Command) → publish_state. It is invoked via LangGraphAgentServerGateway (the langgraph_sdk client →
a remote LangGraph Agent Server; gateway.py:37,58-93) from
HarnessRuntimeService.execute_studio_run() (runtime_service.py:168). run_agent_attempt
only queues a session to the runtime — it does not run agent reasoning, and handoffs are
explicitly not fed through graph edges (graph.py:223-229: “the mailbox lives in the
dispatch plane”).
(2) Spec/contract-only (not executed).
orchestration/domain/agent_graph/contract.py:agent_graph_role_contract() emits per-role
LangGraph identity metadata (thread_owner="langgraph", assistant/skill/MCP bindings) that
agentsquad_desired_spec.py:88-96 stamps into the desired state. langgraph_prompt_contract.py
is a message-shape adapter so agent state matches the LangGraph/LangChain prompt convention.
(3) portal_copilot/application/graph.py — a separate 2-node copilot graph, unrelated to
squad execution.
Net: LangGraph is not the squad’s decision engine. It (a) optionally models one concurrency wave and (b) supplies contract/prompt shapes. The live per-tick dispatch is the control-plane loop below.
Run lifecycle (the live control-plane path)
Section titled “Run lifecycle (the live control-plane path)”Create (POST /v1/admin/studio/runs):
create_run() studio_control_plane_service.py:353 → run_lifecycle.create_run() → _create_run_with_preflight() run_lifecycle.py:260,338 → desired_agent_squad_spec(); provision_agent_roster_identities() → repo.create_run(); _create_seed_sessions() → create_seed_sessions() features/lifecycle/session_seed.py → runtime_command_service.queue_run_execution() (→ Temporal STUDIO_RUN_EXECUTE)Tick (run_dispatch.execute_run(), ~per heartbeat) — for each session:
- infer runner paths + real first tool from events
- revive terminal sessions with pending mailbox (
_revive_terminal_mailbox_session) - clear stale walls when source evidence is observed (
dispatch_wall_fallback) - first-tool deadline (
run_dispatch.py:752-817): ifwaiting_first_tooland elapsed ≥FIRST_TOOL_EVIDENCE_DEADLINE_SECONDS(120s, clock resets on each liveness signal) → wallagent_first_tool_required/source_evidence_wall - release terminal slots; 6. observe runner completion
(
observe_daytona_toolbox_slot_session→apply_execution_runner_result); 7. launch ready sessions →dispatch_run_session_prompt→run_daytona_toolbox_slot_session(daytona_runner) close_run_when_all_sessions_terminal()
First-tool gate (project_delivery_metadata.py): satisfied by fractalops_workspace_*,
mcp__*, claude-native (Bash/Write/Edit/Read/Glob/Grep), or codex (codex:shell:*,
codex:*). (Runtime-agnostic — codex coders clear it with any codex: call.)
Report → handoff → mailbox → next launch:
report_recorder.record_report() report_recorder.py:146 → _record_report_handoff() → StudioReportHandoffRecorder.record_handoff() report_handoff.py:223 → per target: _find_agent_session() | _spawn_handoff_target_session()→ensure_handoff_target_session() → build_handoff_payload() → create_or_refresh_pending_mailbox_entry() → queue_agent_mailbox_delivery() → session.status="ready", pendingMailboxDelivery=True → next execute_run tick revives the target with the mailbox payloadCompletion → PR: project_delivery_completion_requires_pr()
(project_delivery_completion_policy.py:303) blocks backend/frontend/contract completion that
lacks pr_url+branch_name; run_result_projection.py then auto-hands-off to committer/tester.
Subflows
Section titled “Subflows”| Subflow | Where |
|---|---|
| Coordination-completion handoff (planner/curator → next) | run_result_projection._dispatch_completion_handoff |
| Source-ref delivery handoff (coder → committer/tester) | run_result_projection._dispatch_source_ref_handoff |
| Mailbox delivery / revive | features/team/mailbox.py + run_dispatch.py:498 |
| Wall fallback (retry / skip / workspace reset) | features/execution/wall_fallback_contract.py |
| Replay (fresh retry of terminal session) | features/runtime/run_replay.py |
| Session intervention (pause/resume/override/switch_runtime) | features/runtime/session_intervention.py |
| Heartbeat / liveness | features/runtime/agent_heartbeat.py |
| Cleanup (release all slots) | features/execution/run_closure.py |
Duplicate / hop-only patterns to consolidate
Section titled “Duplicate / hop-only patterns to consolidate”Ranked by payoff:
- Plan→execute split between the LangGraph graph and the control-plane tick.
CORRECTION (deeper read): these are NOT two parallel engines.
execute_pending_run_execution(pending_run.py) runs them as a pipeline: the LangGraph graph (execute_studio_run) is the PLANNER — it picks the frontier/wave (which sessions launch this turn, concurrency, armory bind) and returnslaunched_sessions; then the control-plane tick (service.execute_run(session_ids= <graph's chosen sessions>)) is the EXECUTOR — it launches exactly those into daytona and runs the lifecycle (first-tool gate, observe-runner, close, mailbox-revive). LangGraph is therefore ALREADY the orchestration brain. The real duplication is that the tick (run_dispatch~1850L +run_result_projection~1960L) also holds orchestration DECISION logic (frontier/lifecycle/close/ handoff-routing) that overlaps the graph’s. P2 = move that decision logic INTO graph nodes and leave the tick a pure executor (daytona I/O only) — the graph’srun_agent_attemptcurrently emits exec-cell metadata the tick re-derives; unify on the graph as the single decision authority. DECIDED (operator, 2026-06): LangGraph is the standard orchestration engine. Rationale: beyond coding agents, the squad integrates into the broader LangChain/LangGraph ecosystem later — so the orchestration brain must be a LangGraphStateGraph, not a bespoke tick loop. Target shape (see “Target: LangGraph as the standard engine” below): the control-plane tick’s responsibilities (first-tool gate, observe-runner, launch, close, mailbox revive) fold into LangGraph nodes; the daytona workspace + mailbox become the substrate the graph drives, not a parallel engine; runtime execution is a cleanAgentRuntimeport with claude/codex/antigravity adapters. Migrate incrementally and non-breakingly; do not rip out the tick before the graph path is proven end-to-end. - First-tool wall decision logic split across
run_dispatch.pyandrun_result_projection.py(_silence_window_origin,_source_evidence_first_tool_observed) — one gate, two homes. - Execution-slot metadata merge duplicated:
execution_slot_prompt_dispatch_metadata()(studio_run_execution_command.py:631) vs.merged_execution_slot_metadata()(run_dispatch.py:577). - Mailbox-revive metadata duplicated:
mailbox.pyvs.run_result_projection._fresh_mailbox_revive_metadata. - DI-by-parameter wiring across
native_ops_*+ thenative_operations.pyfacade (412 kwarg-forward lines + ~72Callableparams) — being collapsed; see [[squad-provisioning-audit]] family work. - Thin delegation hops:
studio_control_plane_service(_create_and_execute_run,_launch_prompt,_preflight,_assert_preflight),run_lifecycle(5×_*_project_squad_*), Temporalenqueue_workflow(backend.py:11) +request_background_studio_*(background_workflow.py) — collapse to direct calls/aliases where they add no logic.
Target: LangGraph as the standard engine
Section titled “Target: LangGraph as the standard engine”Decision (operator, 2026-06): the AgentSquad orchestration brain is a LangGraph StateGraph.
Everything else is a clean port/adapter around it so the squad slots into the LangChain/LangGraph
ecosystem (checkpointers, LangSmith, subgraphs, human-in-the-loop) without a rewrite.
Three layers, sharp boundaries:
-
Engine (LangGraph
StateGraph) — the single source of orchestration truth: intake → bind armory → allocate workspace → route frontier → dispatch → collect → reconcile inbox → decide-next → publish. The control-plane tick (run_dispatch.execute_run) is absorbed into graph nodes (first-tool gate, observe-runner, launch, close, mailbox revive become node logic or tool-calls the graph makes). One engine — the bespoke tick stops being a second brain. State lives inHarnessGraphState; persistence via a LangGraph checkpointer (so a run is resumable/inspectable through the LangGraph runtime, not a side DB tick). -
Runtime port (
AgentRuntime) — a narrow interface thedispatch/run_agent_attemptnode calls to actually execute one agent turn, with adapters:ClaudeRuntime,CodexRuntime,AntigravityRuntime. Each adapter owns its CLI/transport specifics — MCP wiring, hooks, skills config, auth/base-url (e.g. cliproxy bridge vs native) — behind the same port. No codex-only shims leak into the engine (closes the “codex agents not reaching cliproxy” class of bug: that becomes one adapter’s contract, testable in isolation). See [[runtime-abstraction-not-codex-only]]. -
Substrate — daytona workspace (git worktrees), the mailbox handoff bus, and Temporal as the durable trigger/heartbeat for the engine. These are things the engine drives, not alternative orchestrators. Mailbox stays the handoff transport; the graph reads it in
reconcile_inbox(handoffs are data, not graph edges).
Migration is incremental and non-breaking: (P1) extract the AgentRuntime port + adapters and
route the current dispatch through it; (P2) fold the tick’s gate/observe/launch/close into graph
nodes and make the LangGraph engine the live driver for project-delivery; (P3) prove a run
end-to-end on the engine, then retire the parallel tick. Never delete the tick before the graph
path is proven live.