Web4Guru AI Operations
Docs · Specialists · Browser

The Browser Specialist

Drives a real Chromium browser via Playwright. Fills forms, logs into portals the owner authorized, extracts structured data, captures screenshots, runs visual QA. Never spends money, never clicks Buy or Submit without approval.

When the CEO calls on this specialist

From ceo/tools.ts: “USE WHEN: task needs JS execution, form filling, login-walled pages, multi-step navigation, or visual QA of a shipped page. DO NOT USE WHEN: a static public page fetch would do (use WebFetch), the task is research across many sources (use research), or two+ browser tasks are queued (sequence them; only one browser at a time).”

What they take as input

  • task_id — kebab-case slug, e.g. acme-portal-scrape-prices
  • brief — starting URL, what to look for, what to extract or fill, definition of done. If credentials or payment info are involved, the brief must state that the specialist must stop and ask for approval first.
  • context_files (optional, for awareness only — the specialist has no file-read tools; pull relevant context into the brief text itself)

What they produce

  • Screenshots under ~/.blackbox/browser-state/<task_id>/screenshots/
  • Structured extracted data (JSON-shaped fields from the brief) returned to the CEO
  • A one-paragraph summary naming what was done, where screenshots live, any blockers
  • An approval_needed card before any destructive click, an error_alert card on failure

Tools they have access to

From apps/engine/src/specialists/browser/spec.ts — a small, safety-first whitelist:

  • browser_open, browser_click, browser_fill, browser_read, browser_screenshot, browser_wait_for, browser_extract_structured, browser_close
  • CEO shared: emit_owner_card, append_lesson, write_inter_agent_note, request_peer_review, request_replan
  • Project-room tools
  • No Read, Write, Edit, Bash, or WebFetch — it drives a browser, not a filesystem.

Workspace

setupBrowserWorkspace(taskId) creates ~/.blackbox/browser-state/<taskId>/ with a pre-made screenshots/ subdirectory. Idempotent. The delegate manages session state (cookies, localStorage) in that directory so multi-step flows can resume.

Example brief

From an owner message like “QA the new landing page on mobile and take screenshots”, the CEO would call:

delegate_to_browser_specialist(
  task_id: "pam-strategy-mobile-qa",
  brief: "Open https://pam-strategy-landing.vercel.app at viewport 390x844 (iPhone 14). Take a full-page screenshot. Scroll through all three sections; screenshot each. Click the footer CTA — confirm it opens the Calendly URL in a new tab; screenshot the Calendly page load. Do NOT book anything — just confirm the URL loads. Report any visual breaks (overlapping text, cut-off buttons, broken images). Definition of done: four screenshots + QA findings.",
)

Example output

Browser task pam-strategy-mobile-qa: success
Workspace: ~/.blackbox/browser-state/pam-strategy-mobile-qa
Summary: Ran mobile QA on pam-strategy-landing. Hero and problem grid render cleanly at 390x844. Footer CTA opens the Calendly page correctly. One visual issue: the second problem-grid card has a 2px overflow on the right edge — screenshot attached. No layout shift, fonts load fast.
Screenshots: hero.png, problem-grid.png, footer-cta.png, calendly-load.png
Extracted data: "overflow_found": true, "overflow_location": "problem-grid card 2"

Related specialists

Browser pairs most with Coding (Coding ships a page; Browser QAs it) and Research (Research finds URLs on the open web; Browser goes in where auth is needed). It occasionally feeds Business Ops when the extracted data is a list of invoices to record.

Frequently asked

Can it sign up for trials on my behalf?
Only after an approval_needed card. Sign-ups commit the owner to something; the specialist won't click Submit without explicit approval.
Does it persist cookies?
Yes, inside the task workspace. So a multi-turn flow (log in → navigate → extract) can complete across turns.
What if the target site blocks bots?
It will fail gracefully and emit an error_alert. It does not try to circumvent bot detection.

See also