Circuit Breaker (AI Safety)

A circuit breaker is a safeguard that halts an agent loop when it exceeds budget, loops on errors, or breaches a policy, preventing runaway behavior.

In plain English

A circuit breaker in AI safety is the same idea as a circuit breaker in an electrical panel: when something goes wrong, it cuts the power. In an agent system, "something wrong" can mean several things — the agent is burning tokens in a loop, it is repeating the same failing tool call, it is about to touch a forbidden resource, or it has run past its budget. The circuit breaker detects the condition, halts the run, and surfaces what happened to the human.

Breakers protect against the failure mode most unique to agentic systems: quiet, expensive divergence. A chatbot that gives a bad answer loses you five seconds. An agent in a pathological loop can burn dollars per minute and take destructive actions against your data. Every serious agent deployment wires in breakers on cost, time, tool-call count, and error rate.

Why it matters for Black Box

Black Box's circuit breakers are documented in agent-constitutions/circuit-breaker.md. They cap per-task tokens, per-session wall-time, consecutive tool errors, and forbidden-action attempts. A tripped breaker pauses the run, notifies the owner, and lets them resume, restart, or abandon.

Examples

Halting when the same file is edited twelve times in a row without test pass.
Stopping a session that exceeds $5 in model spend.
Blocking any tool call that targets a production domain without approval.

Circuit Breaker (AI Safety)

In plain English

Why it matters for Black Box

Examples

Related terms