Web4Guru AI Operations

The CEO Agent

A single long-running agent loop that takes plain-English goals, decomposes them into work, delegates to the right specialist, and reports back with real outcomes. The one you actually talk to.

How it works

The CEO runs as a single long-running query() on @anthropic-ai/claude-agent-sdk. The source lives at apps/engine/src/ceo/loop.ts. Stdin accepts NDJSON user turns; stdout emits assistant messages, tool calls, and bb_event cards as newline-delimited JSON. The two halves run concurrently inside one SDK call — the loop never terminates between turns, which is what gives the CEO its session-long memory.

On each owner turn, the CEO pumps the message into the SDK, which plans, emits tool_use blocks, and either answers directly or delegates. Delegation goes through an MCP tool server (bb-ceo-tools) exposing delegate_to_coding_specialist, delegate_to_content_specialist, and the 16 other specialist delegates. Each delegate spawns the specialist as a sub-agent, pipes back a structured result, and bubbles up any intermediate owner cards.

Context management is non-trivial. Every user turn and every assistant/result message is appended to ~/.blackbox/memory/session-<ts>.jsonl for forensic replay. A background summarizer tracks a running token estimate; when it crosses 150K it compresses older turns into a "session-so-far" preamble and injects it into the next user turn. Every 25 user turns a Markdown checkpoint is written to ~/.blackbox/memory/checkpoints/. On startup the most recent checkpoint is loaded and prepended to the first user turn — "here\'s where we left off."

Budgets and circuit breakers run underneath all of this. The CEO has a per-turn cost cap (CEO_TURN_COST_CAP_CENTS in context/budgets.ts) and every task the CEO spawns is tracked by the circuit breaker in apps/api/src/engine/circuit-breaker.ts — tool-call count, token spend, duration, idle time, and output-loop similarity. Any one of those tripping halts the offending task and surfaces an error_alert card to the owner.

What you see in the UI

In the Boardroom view you see the CEO\'s typing indicator, a soft-trace Thinking strip above it when the CEO emits ceo_thinking events, and a PlanIndicator that renders "Planning… / Executing step N of M" when the CEO runs a propose_planexecute_plan sequence. You do not see JSON, terminal output, or raw tool calls.

When the CEO needs a decision, an approval, or has a finished deliverable, it emits a card via emit_owner_card. Those land in the Approvals Inbox. Everything else — routine tool use, specialist handoffs, file reads — stays invisible.

A concrete example

You type "ship me a landing page for my pilates studio." The CEO recognizes this as the landing-page-bootstrap playbook trigger. It runs the gates (GitHub connected? Railway connected? onboarding answers present?), then delegates a copy brief to the Coding Specialist, then a render brief, then a GitHub push, then a Railway deploy. It calls review_deliverable_with_evaluator before surfacing the result. When the evaluator returns PASS, it emits a single report_ready card with the live URL. Total CEO turns the owner sees: about three. Total specialist calls under the hood: five.

Technical details

The CEO\'s allowed-tool list is a closed set — see CEO_TOOL_NAMES in ceo/tools.ts. Adding a tool requires adding it to that list; the SDK rejects anything outside it. This is how we keep the CEO from accidentally shelling out or writing files directly.

// ceo/tools.ts
export const CEO_TOOL_NAMES = [
  'mcp__bb-ceo-tools__emit_owner_card',
  'mcp__bb-ceo-tools__delegate_to_coding_specialist',
  'mcp__bb-ceo-tools__delegate_to_content_specialist',
  // …16 more delegates…
  'mcp__bb-ceo-tools__review_deliverable_with_evaluator',
  'mcp__bb-ceo-tools__record_outcome',
  'mcp__bb-ceo-tools__consult_guidance',
  'mcp__bb-ceo-tools__propose_plan',
  'mcp__bb-ceo-tools__execute_plan',
  // …
] as const;

The SDK subprocess is spawned either as a compiled binary (bb-engine-x86_64-apple-darwin for the desktop sidecar) or as a Node library call (runEngine() from @blackbox/engine, used by the Railway-hosted API server). Same loop, different transport.

Related features

Related concepts

FAQ

Is the CEO a different model than the specialists?

No — CEO and specialists both run on Claude (Sonnet 4 by default). The difference is the system prompt, the tool belt, and the memory scope. The CEO gets delegate tools plus the owner card emitter; specialists get file-system, browser, or domain-specific tools.

What happens when the CEO's context window fills up?

A summarizer runs in the background, tracks a running token estimate, and at around 150K tokens it compresses older turns into a "session-so-far" preamble injected into the next user turn. A Markdown checkpoint is also written every 25 turns to ~/.blackbox/memory/checkpoints/. On restart the most recent checkpoint is prepended to the first user turn.

Does the CEO ever execute code or write files directly?

No. The CEO only calls MCP tools — delegate_to_*, emit_owner_card, review_deliverable_with_evaluator, record_outcome, consult_guidance. Every file write, browser click, or API call goes through a specialist. That separation is how we keep the orchestration layer auditable.

How does the CEO know which specialist to pick?

The CEO system prompt lists each specialist's role and when to use them. For ambiguous tasks, the CEO can call propose_plan first, then execute_plan once the plan passes its own sanity checks. Playbooks like landing-page-bootstrap.md hardcode the routing for well-known sequences.

Try Black Box

A CEO agent plus 18 specialists, one subscription, seven minutes from sign-up to a live URL.