Web4Guru AI Operations

Prompt Caching

Prompt caching is an API feature that stores and reuses large, repeated prompt prefixes across requests to reduce latency and cost.

In plain English

Most agent requests share a big identical prefix: the system prompt, the tool definitions, the playbook, the early history. Prompt caching lets you tell the API, "mark this prefix as cacheable." The provider stores the processed state on its side; subsequent requests that match the prefix reuse the cache, skipping the work. Anthropic's implementation typically cuts cost on cached tokens by about 90% and reduces latency measurably.

For agents, the win is big. A specialist's system prompt plus tool belt can be 20K tokens. Multiply by thirty turns and you are paying for 600K tokens that never change. Cache it once and you pay full price only on the first turn and the cached rate thereafter. Agents built without prompt caching are leaving serious money on the table.

Why it matters for Black Box

Black Box caches every specialist's system prompt and tool belt, plus the CEO agent's playbook library. Cached hits are the reason the product can afford to let the CEO delegate freely instead of rationing specialist calls. Margin on Pro and Scale plans depends on this working.

Examples

  • Caching a 15K-token system prompt so the model only reprocesses new turns.
  • Caching a reference document so it can be referenced cheaply across dozens of related questions.
  • Warming the cache at session start so the first "real" turn is already fast.

Related terms