Web4Guru AI Operations

Context Window

A context window is the maximum number of tokens a language model can attend to at once, including the prompt, conversation history, and tool outputs.

In plain English

A context window is the model's working memory. Every token the model sees on a given turn — the system prompt, the conversation history, any documents or tool outputs you included — must fit inside it. Modern models range from roughly 128K tokens (about 300 pages) to 1M+ tokens, and the number keeps climbing.

But bigger is not free. Models attend less well to information buried deep in a long context (the so-called "lost in the middle" effect), and latency and cost scale with input length. Good agent design treats the context window as a precious resource: summarize old turns, retrieve only relevant documents, prune tool outputs before the next turn. A thoughtful agent running on a 200K-token model will often beat a sloppy agent on a 2M-token one.

Why it matters for Black Box

Black Box's CEO agent runs on a 1M-token Claude model with a working budget of 150K tokens before the summarizer kicks in. Specialists get smaller budgets appropriate to their scope. The orchestration is designed so no single agent ever has to juggle the full session.

Examples

  • A model with a 200K-token context window can hold about 150,000 words of input.
  • An agent summarizing its history every 80K tokens to stay inside budget.
  • Retrieval injecting the three most relevant paragraphs instead of a whole document.

Related terms