The CEO Agent
A single long-running agent loop that takes plain-English goals, decomposes them into work, delegates to the right specialist, and reports back with real outcomes. The one you actually talk to.
How it works
The CEO runs as a single long-running query() on
@anthropic-ai/claude-agent-sdk.
The source lives at apps/engine/src/ceo/loop.ts.
Stdin accepts NDJSON user turns; stdout emits assistant messages,
tool calls, and bb_event cards as newline-delimited
JSON. The two halves run concurrently inside one SDK call —
the loop never terminates between turns, which is what gives
the CEO its session-long memory.
On each owner turn, the CEO pumps the message into the SDK,
which plans, emits tool_use blocks, and either
answers directly or delegates. Delegation goes through an MCP
tool server (bb-ceo-tools) exposing
delegate_to_coding_specialist,
delegate_to_content_specialist, and the 16 other
specialist delegates. Each delegate spawns the specialist as
a sub-agent, pipes back a structured result, and bubbles up
any intermediate owner cards.
Context management is non-trivial. Every user turn and every
assistant/result message is appended to
~/.blackbox/memory/session-<ts>.jsonl for
forensic replay. A background summarizer tracks a running
token estimate; when it crosses 150K it compresses older turns
into a "session-so-far" preamble and injects it into the next
user turn. Every 25 user turns a Markdown checkpoint is written
to ~/.blackbox/memory/checkpoints/. On startup the
most recent checkpoint is loaded and prepended to the first
user turn — "here\'s where we left off."
Budgets and circuit breakers run underneath all of this. The
CEO has a per-turn cost cap (CEO_TURN_COST_CAP_CENTS
in context/budgets.ts) and every task the CEO
spawns is tracked by the circuit breaker in
apps/api/src/engine/circuit-breaker.ts — tool-call
count, token spend, duration, idle time, and output-loop
similarity. Any one of those tripping halts the offending task
and surfaces an error_alert card to the owner.
What you see in the UI
In the Boardroom view you see the CEO\'s typing indicator, a
soft-trace Thinking strip above it when the
CEO emits ceo_thinking events, and a
PlanIndicator that renders
"Planning… / Executing step N of M" when the CEO runs a
propose_plan → execute_plan sequence.
You do not see JSON, terminal output, or raw tool calls.
When the CEO needs a decision, an approval, or has a finished
deliverable, it emits a card via emit_owner_card.
Those land in the Approvals Inbox. Everything else — routine
tool use, specialist handoffs, file reads — stays invisible.
A concrete example
You type "ship me a landing page for my pilates studio." The
CEO recognizes this as the landing-page-bootstrap
playbook trigger. It runs the gates (GitHub connected? Railway
connected? onboarding answers present?), then delegates a copy
brief to the Coding Specialist, then a render brief, then a
GitHub push, then a Railway deploy. It calls
review_deliverable_with_evaluator before surfacing
the result. When the evaluator returns PASS, it emits a single
report_ready card with the live URL. Total CEO
turns the owner sees: about three. Total specialist calls
under the hood: five.
Technical details
The CEO\'s allowed-tool list is a closed set — see
CEO_TOOL_NAMES in ceo/tools.ts.
Adding a tool requires adding it to that list; the SDK rejects
anything outside it. This is how we keep the CEO from
accidentally shelling out or writing files directly.
// ceo/tools.ts
export const CEO_TOOL_NAMES = [
'mcp__bb-ceo-tools__emit_owner_card',
'mcp__bb-ceo-tools__delegate_to_coding_specialist',
'mcp__bb-ceo-tools__delegate_to_content_specialist',
// …16 more delegates…
'mcp__bb-ceo-tools__review_deliverable_with_evaluator',
'mcp__bb-ceo-tools__record_outcome',
'mcp__bb-ceo-tools__consult_guidance',
'mcp__bb-ceo-tools__propose_plan',
'mcp__bb-ceo-tools__execute_plan',
// …
] as const;
The SDK subprocess is spawned either as a compiled binary
(bb-engine-x86_64-apple-darwin for the desktop
sidecar) or as a Node library call (runEngine()
from @blackbox/engine, used by the Railway-hosted
API server). Same loop, different transport.
Related features
- The 18 specialists — the agents the CEO delegates to.
- Evaluator gate — why every
report_readycard must pass review. - Circuit breaker — the 5 halt reasons that keep runaway loops contained.
Related concepts
FAQ
Is the CEO a different model than the specialists?
No — CEO and specialists both run on Claude (Sonnet 4 by default). The difference is the system prompt, the tool belt, and the memory scope. The CEO gets delegate tools plus the owner card emitter; specialists get file-system, browser, or domain-specific tools.
What happens when the CEO's context window fills up?
A summarizer runs in the background, tracks a running token estimate, and at around 150K tokens it compresses older turns into a "session-so-far" preamble injected into the next user turn. A Markdown checkpoint is also written every 25 turns to ~/.blackbox/memory/checkpoints/. On restart the most recent checkpoint is prepended to the first user turn.
Does the CEO ever execute code or write files directly?
No. The CEO only calls MCP tools — delegate_to_*, emit_owner_card, review_deliverable_with_evaluator, record_outcome, consult_guidance. Every file write, browser click, or API call goes through a specialist. That separation is how we keep the orchestration layer auditable.
How does the CEO know which specialist to pick?
The CEO system prompt lists each specialist's role and when to use them. For ambiguous tasks, the CEO can call propose_plan first, then execute_plan once the plan passes its own sanity checks. Playbooks like landing-page-bootstrap.md hardcode the routing for well-known sequences.
Try Black Box
A CEO agent plus 18 specialists, one subscription, seven minutes from sign-up to a live URL.