# Testorax — Agent Guide

Authoritative agent-readable guide for Claude Code, Codex, Cursor, Windsurf,
Cline, and any MCP-compatible coding agent. Provider-neutral.

---

## One-flow audit (start here for any target app)

```
# 1. Discover capabilities
testorax capabilities --json

# 2. Get a one-flow audit plan + signed run token
testorax audit preview https://your-app.example.com --json > preview.json

# 3. Run the preview directly (Batch 2 — orchestrated)
testorax audit run --from-preview preview.json --json --no-open

# 4. Fetch proof packet once the run finishes
testorax proof <runId> --json
```

For authenticated dashboards:

```
# Create a Login Memory profile first (one-time)
op read "op://Private/sandbox/password" | testorax safe-login create --password-stdin \
  --login-url https://app-sandbox.example.com/login --label "Sandbox" --username alice --json

# Preview the authenticated audit (uses the saved login)
testorax audit preview https://app-sandbox.example.com --auth-session login_<22> --scope authenticated --json > preview.json

# Run from preview — bound to the auth profile + correct origin
testorax audit run --from-preview preview.json --json --no-open
```

The `audit run` endpoint:
- Verifies the HMAC-signed preview token (30-min TTL).
- Re-checks URL safety, auth-profile ownership/status/origin at run time.
- Starts `fast_bug_scan` for `public_fast_scan` and `authenticated_smoke` for `authenticated_smoke`.
- Returns `status: started | blocked | expired | unsupported | tampered`.
- For unsupported modes without config (`full_crud_e2e`, `campaign_execution`): returns the right next CLI command instead of auto-running.
- Underlying run-credit billing is preserved (admin-tier customers bypass).

### Deep audit (Batch 3) — workflow / full CRUD / campaign

When you already have a safe config, audit preview can describe what deeper modes can run, and audit run can dispatch them. NEVER invents config.

```
# Full CRUD E2E from an existing CrudConfig + explicit safety policies
testorax audit preview https://app-sandbox.example.com \
  --auth-session login_<22> --scope authenticated --goal full_regression \
  --crud-config ./crud.json --qa-prefix qa_ --allow-create --cleanup --json > preview.json

# Inspect deepAudit block — must say status=available before running
cat preview.json | jq .deepAudit

# Run the deep audit (must re-supply crudConfig inline; never stored in token)
testorax audit run --from-preview preview.json --crud-config ./crud.json --json
```

`deepAudit` block on the preview describes:
- `status`: `available` / `partial` / `blocked` / `unsupported` / `not_requested`
- `supportedModes`: which of `workflow_test` / `full_crud_e2e` / `campaign_execution` can run
- `missingRequirements`: closed-set blockers (`missing_workflow_config`, `missing_crud_config`, `missing_campaign_config`, `missing_safe_mutation_policy`, `missing_cleanup_policy`, `auth_profile_required_for_deep_audit`, `unsupported_without_operator_config`, `manual_config_required`)
- `configRefs[]`: each ref resolved as `present` / `absent` / `invalid_shape`
- `safeMutationPolicy` + `cleanupPolicy` status
- `nextCommands[]`: exact remediation commands

`audit run` refuses mutations without both `safeMutationPolicy` AND `cleanupPolicy` declared at preview time. `hard_delete` requires `destructiveAllowed=true` in the CrudConfig.

### Safe Mutation Cleanup Automation (Batch 1)

Every `deepAudit` block now carries a `cleanupPlan` and every audit-run response carries a `cleanupResult`. Closed-set vocabularies:

- `cleanupPlan.strategy`: `none | manual | auto | qa_prefix | config_revert | archive | soft_delete | hard_delete`
- `cleanupPlan.policyStatus`: `present | missing | not_required | invalid`
- `cleanupPlan.blockedReasons`: `cleanup_policy_missing`, `safe_mutation_policy_missing`, `qa_prefix_missing`, `destructive_action_blocked`, `hard_delete_not_allowed`, `delete_endpoint_missing`, `cleanup_verification_missing`, `manual_cleanup_required`, `cleanup_not_supported_for_mode`, `production_mutation_blocked`
- `cleanupResult.status`: `not_started | not_required | completed | partial | failed | unverified | blocked`

Rules an agent must follow:

1. Never run mutation-heavy tests without a `cleanupPolicy`. Preview will return `cleanupPlan.policyStatus=missing` and refuse.
2. Always use a QA prefix (`--qa-prefix qa_`) or supply `trackedEntities[]`. Without one, mutations are blocked with `qa_prefix_missing`.
3. Prefer `--archive-instead-of-delete` over `--hard-delete`. Hard delete requires `destructiveAllowed=true` AND a QA prefix.
4. Report cleanup status honestly. `cleanupResult.status=unverified` is NOT `completed`.
5. Production mutation is blocked by default. `--hard-delete` against production hosts is refused with `production_mutation_blocked`.

CLI flags added in this batch:

```
--cleanup-strategy <s>     # none|manual|auto|qa_prefix|config_revert|archive|soft_delete|hard_delete
--delete-created            # delete the QA records Testorax created
--revert-config             # revert config changes Testorax made
--archive-instead-of-delete # prefer archive/soft delete (safer)
--hard-delete               # request hard delete (still gated by destructiveAllowed)
```

### Launch-Readiness Verdict Engine (Batch 1)

After collecting evidence, ask Testorax for a launch verdict:

```
testorax audit preview <url> --json > preview.json
testorax audit run --from-preview preview.json --json > run.json
testorax proof <runId> --compact --json > proof.json
testorax launch readiness --target <url> --scope launch \
  --from-audit-preview preview.json \
  --from-audit-run run.json \
  --from-proof proof.json \
  --json
```

Closed-set verdicts:

- `GO`: no blockers, no critical proof gaps.
- `NO_GO`: critical/high bug, auth failure, cleanup failure, Chrome runner_mismatch, false_pass_risk, audit_run tampered/blocked.
- `GO_WITH_ACCEPTED_RISKS`: only low/medium risks remain, all listed in `acceptedRiskIds`.
- `GO_AFTER_DEPLOY`: caller declared deployment blockers (fixes in branch, not production).
- `GO_AFTER_PRODUCT_DECISION`: caller declared open product decisions.
- `GO_AFTER_EXTERNAL_INTEGRATIONS`: caller declared external provider/integration blockers.
- `INSUFFICIENT_PROOF`: launch scope requested but only public/login-wall evidence supplied, or no Chrome parity for launch scope, or no cleanup evidence for a mutation run.

Rules an agent must follow:

1. Do not ask "can I launch?" without proof. Run audit preview/run first.
2. Treat `INSUFFICIENT_PROOF` as a real blocker, not a pass.
3. Do not claim authenticated coverage if `evidence.authCoverage` is `public_only` or `login_wall_only`.
4. Do not claim cleanup completed unless `cleanupResult.status=completed`.
5. Do not claim Chrome parity confirmed unless `chromeParity.parityStatus=chrome_confirmed`.
6. Do not overclaim beyond the proof scope supplied to the readiness call.

API/MCP/CLI:

- REST: `POST /api/launch/readiness` (public; 64 KB body cap; 30/min/IP rate limit)
- MCP: `launch_readiness` (hosted + stdio)
- CLI: `testorax launch readiness [--target <url>] [--scope public|authenticated|all|launch] [--run-id <id>] [--fetch-proof|--no-fetch-proof] [--max-evidence-age-hours <n>] [--deployment-blocker <id>] [--from-audit-run …] [--from-audit-preview …] [--from-proof …] [--from-chrome-parity …] [--from-cleanup …] [--accept-risk <id>] [--json]`

### Proof Packet Cleanup Telemetry (Batch 1)

Proof packets (`GET /api/runs/:id/proof-pack.json`) now carry a stable top-level `cleanupTelemetry` field at `reportContractVersion=1.10.0`. Use this — not `cleanupPlan` from preview — to answer "did cleanup actually happen?".

```json
{
  "cleanupTelemetry": {
    "contractVersion": "1.0.0",
    "required": true,
    "status": "completed",
    "verificationStatus": "verified",
    "cleanupVerified": "yes",
    "createdCount": 1,
    "cleanedCount": 1,
    "failedCount": 0,
    "unverifiedCount": 0,
    "trackedEntityCount": 0,
    "strategy": "auto",
    "qaPrefix": null,
    "proofRefs": ["operational_e2e_json:cleanupVerified=yes"],
    "warnings": [],
    "blockedReasons": [],
    "doNotClaim": ["Do not claim cleanup happened on entities outside the verified scope."]
  }
}
```

Closed-set rules:

- `status=not_required` — non-mutating run; cleanup not needed.
- `status=completed` + `verificationStatus=verified` — proof of clean teardown.
- `status=completed` + `verificationStatus=unverified` — dispatch said completed but operational_e2e hasn't confirmed; treat with caution.
- `status=failed | partial | blocked` — NO_GO for mutation-heavy launch scope.
- `status=unverified | not_started` — cleanup did not finish proving; treat as INSUFFICIENT_PROOF.
- `status=unknown` — no signal available; do not claim either way.

Rules an agent must follow:

1. **Cleanup plan is intent. Cleanup telemetry is proof.** Do not trust the plan alone.
2. `verified` is required before claiming cleanup completed.
3. `unverified` is not success.
4. `not_required` is acceptable for non-mutating runs.
5. `failed` / `blocked` / `partial` are launch blockers for mutation-heavy scope.

### False-Positive Classifier (Hardening Batch 1)

Not every failed click is a bug. Before generating a fix prompt, classify the evidence:

```
testorax classify --failure-type selector_missing --selector-in-chrome true --json
testorax classify --http-status 401 --expected-auth --json
testorax classify --from-evidence evidence.json --json
```

Closed-set classifications:

- `real_bug` — same-origin 5xx with assertion, or strong evidence of broken app behavior.
- `likely_false_positive` — high-confidence non-bug pattern.
- `inconclusive` — evidence is thin; do NOT auto-classify either way.
- `expected_behavior` — explicit expected response (401 on a public route with `expectedAuth=false`).
- `test_design_issue` — failing scenario shape, not app code.
- `testorax_capability_gap` — Testorax cannot validate this surface yet.
- `needs_chrome_confirmation` — Chrome parity comparison required before claiming bug.
- `needs_auth_profile` — login wall without an auth profile; create one with `safe-login create`.
- `stale_deploy_suspected` — selector present in Chrome but not in runner.
- `external_navigation_skipped` — click navigated cross-host; expected skip.
- `protocol_handler_skipped` — mailto/tel/sms link; expected skip.
- `auth_gate_expected` — 401 on a route that should be authenticated.
- `selector_drift_suspected` — runner failure, Chrome confirms selector present.
- `runner_environment_issue` — Chrome passed where runner failed.
- `weak_evidence_only` — proof strength weak or no assertion executed.

Rules an agent must follow:

1. Do not treat every failed click as an app bug.
2. Check `falsePositiveAnalysis.classification` before generating a patch.
3. Expected auth gates are not bugs.
4. External / `mailto:` / `tel:` links are usually skipped.
5. Chrome parity can downgrade runner-only failures.
6. Weak proof cannot support launch GO.
7. Never send cookies, passwords, headers, or storageState to the classifier — those fields are refused with 400.

API/CLI/MCP:

- REST: `POST /api/intelligence/classify` (public; 16 KB body cap; 60/min/IP rate limit)
- MCP: `classify_evidence` (hosted + stdio)
- CLI: `testorax classify [--failure-type <s>] [--http-status <n>] [--expected-auth] [--auth-profile-provided] [--login-wall] [--external-navigation] [--protocol-handler <s>] [--chrome-parity <v>] [--selector-in-chrome true|false] [--proof-strength <s>] [--trust-score <n>] [--assertion-count <n>] [--console-message <s>] [--network-host-type <s>] [--cleanup-status <s>] [--from-evidence <file>] [--json]`

### Launch Readiness Batch 3 — proof-pack / report fetch sources

When you pass `--run-id` to launch readiness, Testorax fetches compact proof by default. To also fetch proof-pack.json + report.json summaries, add `--fetch-proof-details`:

```
testorax launch readiness --target https://app.example.com --scope launch \
  --run-id <runId> --fetch-proof-details --json
```

What you get back in `fetchedEvidence.proofSources[]`:

- `source` ∈ `compact_proof | proof_packet | report_json`
- `status` ∈ `fetched | missing | forbidden | malformed | stale | redacted | too_large | skipped`
- `contractVersion` (proof-packet only; e.g. `1.10.0`)
- `cleanupTelemetryStatus` ∈ `not_required | completed | failed | partial | unverified | blocked | unknown` (from proof packet's `cleanupTelemetry` field)
- `falsePositiveClassification` (compact_proof inline classifier)
- `proofScopes[]` (from proof packet evidence kinds)

Raw proof bodies are NEVER embedded in readiness output — only summary fields. Per-source size cap default 32 KB, ceiling 64 KB.

Restrict to a single source if needed:

```
testorax launch readiness --target ... --run-id <id> \
  --proof-source proof-packet --json
```

Merge behavior:

- compact_proof remains primary for the basic verdict.
- proof_packet `cleanupTelemetry` enriches cleanup status (preferred over compact_proof).
- report_json provides a top-level run status as a sanity check.
- If any source says `failed`, the verdict cannot GO — even if compact_proof says pass.
- If proof_packet `cleanupTelemetry.status=failed` → NO_GO.
- If proof_packet `cleanupTelemetry.status=unverified` → INSUFFICIENT_PROOF.
- If proof_packet `cleanupTelemetry.status=completed` + `verificationStatus=verified` → cleanup proof satisfied.
- Target mismatch from any source surfaces in `deploymentAwareness.warnings[]`.
- Malformed / missing / oversized sources DO NOT crash readiness — they appear in `proofSources[]` with the right status and a warning.

Rules an agent must follow:

1. Prefer `testorax launch readiness --target <url> --run-id <id> --fetch-proof-details --json` over manually assembling evidence.
2. Use proof details when cleanup or proof scopes matter for the verdict.
3. Do not paste raw proof packets into chat — readiness already summarizes them.
4. Treat `status=malformed | missing` on a requested source as a proof gap, not a pass.
5. proof packet `cleanupTelemetry` (proof) is stronger than audit preview `cleanupPlan` (intent).
6. Target mismatch still blocks GO.

### Batch 2 — fetch evidence by runId

Prefer the runId path. Testorax fetches compact proof for you and adds two new blocks:

```
testorax launch readiness --target https://app.example.com --scope launch --run-id <runId> --json
```

The response carries:

- `fetchedEvidence.requestedRunIds[]`, `fetchedRunIds[]`, `failedRunIds[]`, `proofSources[]`, `warnings[]`
- `deploymentAwareness.status` ∈ `unknown | same_target | target_mismatch | stale_proof | production_unconfirmed | caller_declared`

Rules an agent must follow:

1. Prefer `--target <url> --run-id <runId>` over manually assembling evidence JSON.
2. Use the **same target URL** as the proof run. Different host → `target_mismatch` → `INSUFFICIENT_PROOF`.
3. Sandbox/staging proof against a production target → `production_unconfirmed` → `GO_AFTER_DEPLOY`. Do not claim production is ready.
4. Stale proof (older than `--max-evidence-age-hours`) → `stale_proof` warning.
5. Run ids capped at 10 per call. Malformed run-id shapes are surfaced as `forbidden` entries; agents see them, no crash.
6. Testorax never returns raw cookies / storageState / passwords. fetchedEvidence proofSources carry counts, timestamps, statuses — never raw evidence bodies.
7. NEVER infer Git branch state. Caller declares deployment blockers via `--deployment-blocker <id>` when fixes are in branch but not in production.

The preview returns:
- `recommendedMode`: `public_fast_scan` / `authenticated_smoke` / `full_crud_e2e` / `campaign_execution` / `blocked`
- `whatWillBeTested` + `whatWillNotBeTested` (explicit)
- `blockedReasons` (closed set — `auth_profile_missing`, `auth_profile_origin_mismatch`, `testorax_capability_gap`, etc.)
- `recommendedCommand` (paste-and-run)
- `chromeParityRecommendation` (whether to add a CDP check)
- `nextSafeActions` (multi-step playbook)
- `doNotClaim` (anti-overclaim warnings)
- `runEndpointStatus` = `planned` (Batch 1) — orchestrated execution from previewId lands in Batch 2.

Honest rules:
- Preview is descriptive. NEVER a run. NEVER charges credits.
- Preview never claims a route is "tested" — only the actual run produces proof scopes.
- If preview says `public_fast_scan`, don't claim authenticated coverage.
- If preview says `blocked` with `auth_profile_missing`, create the profile first via `safe-login create`.

## Discover capabilities first

Before assuming any specific behavior, fetch the agent-readable capability map:

```
testorax capabilities --json    # CLI
GET /api/capabilities           # REST (public, no auth)
get_capabilities                # MCP tool
```

Returns:

- `recommendedEntryPoints` — exact CLI command for public-app / authenticated-app / fix-check / proof / capabilities-itself.
- `capabilities` — per-feature status: `available` / `partial` / `planned` / `blocked` / `operator_required` / `not_supported`.
- `proofScopes` (18-value closed set) — vocabulary for what proof packets mean. Includes `not_reached`, `auth_bound_not_reached`, `partial_only`, `runner_only_evidence`, `chrome_confirmed`, `chrome_rejected_false_positive`, `unsafe_skipped`.
- `blockedReasons` (18-value closed set) — closed-set list of failure-class labels: `auth_profile_missing`, `chrome_backend_unavailable`, `cdp_backend_unavailable`, `unsafe_production_mutation`, `selector_missing`, `test_design_issue`, `testorax_capability_gap`, etc.

Honesty rules baked into the capability map:

- `chromeParity` is **partial** — local MCP works, CDP/hosted fallback is **planned**, not shipped.
- `fullCrudCampaign` is **partial** — caller supplies the CrudConfig; autonomous discovery of CRUD shapes is not in scope.
- `launchReadinessVerdict` is **partial** — verdict + trustScore + safe/disallowed claims ship today; a single "ship / do not ship" verdict is not yet wired.
- `safeMutationCleanup` is **partial** — build-side helpers ship; autonomous cleanup-endpoint discovery is not.
- `iosSafari`, `videoReplay`, `cloudBrowserProviderActivation` are **not_supported**.

When you're unsure what to claim, consult the capability map first.

---

## TL;DR for AI coding agents — start here

**Outside-agent first command (always include `--email`):**

macOS / Linux / WSL:
```
npx testorax@latest run https://your-app.example.com --email you@example.com --json --no-open
```

Windows PowerShell:
```
$env:TESTORAX_NO_CONFIG="1"
npx testorax@latest run "https://your-app.example.com" --email "you@example.com" --json --no-open
```

Windows CMD:
```
set TESTORAX_NO_CONFIG=1
npx testorax@latest run https://your-app.example.com --email you@example.com --json --no-open
```

If the email is eligible (free first Fast Bug Scan per email/domain), the run starts without an API key. Otherwise the CLI returns a structured 402 with a Dodo checkout URL. Either path is honest — no `undefined` placeholders, no PayPal.

**Logged-in dashboard or admin panel (Authenticated Smoke, $1.99 per run):**

```
npx testorax@latest auth-smoke https://dashboard.example.com \
  --session login_<22> --route /admin --route /admin/users --json --no-open
```

The customer first creates a saved login at https://testorax.com/account/login-memory and gives the agent the resulting `login_<22>` id. **Agents NEVER ask for, print, or paste passwords, cookies, or session tokens.** The agent only receives the `login_<22>` id.

**Use `@latest` explicitly.** A previously-installed global `testorax` shadows `npx testorax` on PATH; `npx testorax@latest` always fetches the published version. If you suspect drift, run `npx testorax@latest doctor --json` and read the `cli_version` + `cli_shadow` checks. `cleanMode`, `configRead`, and `apiKeyPresent` booleans tell you whether saved config is being read.

**Stale global install? Verify it.** `npx testorax@latest doctor --json` emits a `cli_shadow` check plus structured fields `globalInstallDetected` / `globalVersion` / `globalPath` / `staleGlobalWarning` / `recommendedFix`. If the warning fires, the fix is:

```
npm uninstall -g testorax
npx testorax@latest --version              # confirm latest resolves
npx testorax@latest run https://your-app.example.com --email you@example.com --json --no-open
```

This is the 2026-05-11 Codex Windows finding: `npx testorax@<exact-version>` can resolve to a stale globally-installed binary even after `npm publish` succeeded. `npx testorax@latest` works correctly (forces fresh fetch); other forms silently pick the stale global. Doctor catches it.

**For Windows TLS/fetch failures specifically**: run `testorax doctor --json` and read the `networkError` + `networkDiagnostic` fields. The diagnostic includes a structured `suggestions` array (Invoke-RestMethod alt, curl.exe alt, NODE_EXTRA_CA_CERTS for corporate MITM). It explicitly does NOT recommend `NODE_TLS_REJECT_UNAUTHORIZED=0`.

### Auth model — one table for all paths

| Path | Best for | Requires | First-run behavior |
|---|---|---|---|
| CLI with `--email` | fastest outside-agent first scan | email only, if eligible | can create free first scan per (email, domain); falls through to 402 + Dodo checkout when ineligible |
| CLI with API key | repeat / paid use | `TESTORAX_API_KEY` env or `--api-key <key>` | charges plan / wallet / free credits |
| REST `/api/runs/start` | integrations / scripts | `X-Api-Key` header | no anonymous run start — 401 envelope includes a CLI fallback example |
| MCP (hosted or stdio) | AI tool integrations | `TESTORAX_EMAIL` + `TESTORAX_API_KEY` (stdio); X-Api-Key header (hosted) | configure once per session |
| Authenticated Smoke | logged-in dashboards / admin panels | Login Memory `login_<22>` id | customer creates the saved login first; agent only sees the id |
| Chrome Proof Bridge | optional browser parity | Chrome extension | secondary proof only — NEVER the primary path |
| Admin bypass | operator / staff | admin-tier API key (`plan='admin'`) | bypasses payment + quota; not a customer path |

**PayPal is NOT used anywhere in the public flow.** Payment processor is Dodo Payments.

### REST + Windows examples

macOS / Linux / WSL:
```
curl -X POST https://testorax.com/api/runs/start \
  -H "X-Api-Key: <KEY>" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://your-app.example.com","mode":"fast_bug_scan"}'
```

Windows PowerShell (curl is aliased to Invoke-WebRequest — use Invoke-RestMethod):
```
Invoke-RestMethod `
  -Uri "https://testorax.com/api/runs/start" `
  -Method POST `
  -Headers @{ "X-Api-Key" = "YOUR_KEY"; "Content-Type" = "application/json" } `
  -Body '{"url":"https://your-app.example.com","mode":"fast_bug_scan"}'
```

Windows CMD (use `curl.exe`, escape JSON quotes with `\"`):
```
curl.exe -X POST https://testorax.com/api/runs/start ^
  -H "X-Api-Key: YOUR_KEY" ^
  -H "Content-Type: application/json" ^
  -d "{\"url\":\"https://your-app.example.com\",\"mode\":\"fast_bug_scan\"}"
```

CLI / REST / MCP all produce identical proof artifacts — runId, screenshots, JSON report.

### Stuck without a terminal? Inspect a sample report instead.

If your environment cannot run terminal commands, you can read these public-by-runId surfaces directly:

```
https://testorax.com/sample-report                                            # HTML demo report
https://testorax.com/api/runs/wUi-37fuagm4KcrkrU3h3/report.json               # real run report.json
https://testorax.com/api/runs/wUi-37fuagm4KcrkrU3h3/proof-pack.json           # full proof packet
https://testorax.com/api/runs/wUi-37fuagm4KcrkrU3h3/compact-proof             # compact decision-ready summary (~3KB)
```

The runId IS the secret. No API key required to read these. Both `report.json`, `proof-pack.json`, and `compact-proof` follow the same public-by-runId model.

### Clean outside-agent test environment

To validate "would a fresh-machine agent succeed?" without touching your saved config:

macOS / Linux / WSL: `TESTORAX_NO_CONFIG=1 npx testorax@latest doctor --json`
PowerShell: `$env:TESTORAX_NO_CONFIG="1"; npx testorax@latest doctor --json`
CMD: `set TESTORAX_NO_CONFIG=1 && npx testorax@latest doctor --json`

`cleanMode: true, configRead: false, apiKeyPresent: false` confirms saved `~/.testorax/config.json` is being ignored.

### When to use what

| Target | Mode | Command |
|---|---|---|
| Public website or app | `fast_bug_scan` ($1.99) | `npx testorax@latest run <url> --json` |
| Logged-in dashboard / admin | `authenticated_smoke` ($1.99) | `npx testorax@latest auth-smoke <baseUrl> --session login_<22> --route <path>` |
| Workflow with assertions | `workflow_test` ($5.99) | `POST /api/runs/start {mode:"workflow_test",workflow:{steps:[...]}}` |
| Full CRUD cycle | `full_crud_e2e` ($29.99) | `POST /api/runs/start {mode:"full_crud_e2e",crudConfig:{...}}` |
| Verify a fix landed | `regression_check` ($0.99) | `POST /api/runs/<PRIOR>/fix-check` |
| Extra browser parity proof | Chrome Proof Bridge | Recommend ONLY when parity returns `static_vs_browser_mismatch` (optional secondary proof) |

### If you have an admin API key (plan='admin')

Use the bypass entry point — no payment, no quota:

```
npx testorax bypass https://your-app.example.com --tier quick_scan --json
```

OR via curl:

```
curl -X POST https://testorax.com/api/runs/bypass \
  -H "X-Api-Key: <ADMIN_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://your-app.example.com","tier":"quick_scan"}'
```

The CLI auto-detects admin plan and routes through `bypass` for you, so a plain
`npx testorax run <url>` from an admin key also works without payment.

### If `npx testorax run` shows `pending_payment`

You're not on an admin plan — your account needs a paid run credit. Either:
1. Visit the Dodo checkout URL the CLI prints
2. OR have an operator add credit to the account
3. OR use a `--promo` code: `npx testorax run <url> --promo <code>`

Don't poll the run status for more than ~30 seconds — it will stay
`pending_payment` until checkout completes. Exit cleanly and surface the
payment URL.

### Testorax is a hosted SaaS, not a local methodology

If you found local files named `TESTORAX_*.md` on the user's machine,
those are roadmap / planning documents, **not the product itself**. The
product is the hosted service at `https://testorax.com` and the npm CLI.
Always prefer `npx testorax` over reading local markdown files.

---

## Chrome Proof Bridge (optional Chrome extension)

The **Testorax Chrome Proof Bridge** is now live on the Chrome Web Store:

- Install URL: `https://chromewebstore.google.com/detail/testorax-chrome-proof-bri/pmlijidlndkcglhojnbpnijgjgepbhpc`
- Extension ID: `pmlijidlndkcglhojnbpnijgjgepbhpc`

Install it when you want **real-Chrome ground truth** added to a campaign preview parity result — i.e. you want `parityStatus: 'chrome_checked'` instead of `parityStatus: 'internal_compared'`. The extension captures a redacted DOM control inventory from your active Chrome tab and posts it to the Testorax preview job. **It never reads cookies, localStorage, auth headers, or input values.**

When to recommend it to the user: a campaign preview returned `parityStatus: 'mismatch_detected'` with `mismatchClassification: 'static_vs_browser_mismatch'` (JS-rendered SPA). A real-Chrome capture resolves the comparison.

When NOT needed: Fast Bug Scan / Workflow Test / Deep CRUD E2E runs. The Hetzner runner's `chrome_managed` engine is the default browser; the bridge is an additional ground-truth signal for campaign parity, not a replacement for the runner.

## Three equal paths — pick whichever works in your session

You do NOT need MCP. Three equal paths, same API, same proof:

| Path | When to use | Setup |
|---|---|---|
| **`testorax` CLI via npx** | Default for AI coding agents. Works in any sandboxed shell. | None. `npx testorax run <url>` runs without install. |
| **curl / REST** | Always works. Any agent with shell or HTTP access. **Use this if MCP isn't connected** — don't wait for a session restart. | None. Just `X-Api-Key` header. |
| **`testorax` CLI installed globally** | Developer machines with persistent shell. | `npm install -g testorax` (NOT recommended for sandboxed agents — use `npx` instead) |
| **`testorax-mcp` MCP server** | Claude Desktop / Claude Code / Cursor with full MCP integration. | Add to MCP config + restart the app. |

**Common failure mode:** an agent tries to load `testorax-mcp` mid-session
in Claude Desktop, the tools don't appear (because Claude Desktop only
reads MCP config on launch), the agent gets stuck and asks for a restart.
**Don't get stuck.** Fall back to curl in the same session — it's the
same API. The MCP is a convenience wrapper, not a requirement.

Minimal curl path (works in every session, every agent, every host):

```bash
# 1. Start a scan
curl -X POST https://testorax.com/api/runs/start \
  -H "X-Api-Key: <KEY>" \
  -H "Content-Type: application/json" \
  -d '{"mode":"fast_bug_scan","url":"https://your-app.example.com","pagesLimit":10}'
# returns {"runId":"...", ...}

# 2. Poll until status=completed or failed
curl https://testorax.com/api/runs/<RUN_ID> -H "X-Api-Key: <KEY>"

# 3. Read the Compact Proof (small, decision-ready)
curl https://testorax.com/api/runs/<RUN_ID>/compact-proof -H "X-Api-Key: <KEY>"
```

That's the whole loop. Everything else (MCP, CLI, dashboard) is a wrapper
around these endpoints.

## What testorax-mcp can do (read this first)

`testorax-mcp` is **not read-only**. It drives the full test loop. The agent
can start runs, watch them live, read proof, run Fix Check — all from MCP.

Key tools (full list in the MCP tools/list response):

- `start_fast_bug_scan({ url, pagesLimit?, scopedPaths?, viewport?, authSessionId? })` — STARTS a new scan
- `start_fast_bug_hunt(...)` — cheaper; stops on first high-confidence bug
- `start_free_fast_bug_scan(...)` — uses the free trial credit
- `prepare_workflow_test(...)` — multi-step user journey
- `start_regression_check({ runId })` — Fix Check loop after a patch
- `get_live_run_status({ runId })` — poll until isFinished:true
- `get_compact_proof({ runId })` — small (~3 KB) decision-ready proof
- `fetch_proof_packet({ runId })` — full Proof Packet (~12 KB) with aiFixPrompt
- `list_auth_sessions()` — Login Memory metadata only (no session material)
- `chrome_capture_for_job(...)` — local Chrome bridge for real-Chrome parity

Common mistake: agents read "fetch findings and read reports" in old docs
and assume the MCP is read-only. It's not. Agents CAN and SHOULD start
runs from MCP. The CLI and dashboard are alternative entry points, not the
primary one.

## Agent Quickstart (cold start, zero prior knowledge)

If you've never used Testorax before, this is the only section you need to
start. Five steps, one run, one decision.

**1. Start a Fast Bug Scan against a public URL** (Testorax cannot reach
localhost — use a deployed/preview/tunnel URL):

```bash
curl -X POST https://testorax.com/api/runs/start \
  -H "X-Api-Key: <KEY>" \
  -H "Content-Type: application/json" \
  -d '{"mode":"fast_bug_scan","url":"https://your-deployed-app.example.com"}'
```

If your sandbox blocks outbound HTTPS (curl exit 35, schannel errors, sandbox
network disabled), use the `testorax-mcp` MCP server instead — it works
through the agent's MCP transport, not the host's network stack.

**2. Poll status** (`GET /api/runs/<runId>`) until `status='completed'` or
`'failed'`. Typical Fast Bug Scan completes in 30-90s.

**3. Read the Compact Proof first** — small (~3 KB), tells you what to do
next:

```bash
curl https://testorax.com/api/runs/<runId>/compact-proof \
  -H "X-Api-Key: <KEY>"
```

The Compact Proof always contains:

- `verdict` (pass/fail/inconclusive/blocked/provider_bug/test_design_issue)
- `failureType` (one of 18 closed-set classifications)
- `runtimeEvidence.network.failedRequests[]` — actual failed HTTP calls with
  url + method + status
- `runtimeEvidence.console.errors[]` — actual console error text
- `nextSafeAction` — plain-language guidance
- `screenshotRecommendation` — when to fetch a screenshot

**4. Decide. The decision tree:**

- `verdict='pass'` and no failures → done. Do NOT claim other untested
  routes are clean.
- `verdict='fail'` → read `runtimeEvidence` and `failureType`. The proof
  IS the bug — patch from it.
- `verdict='inconclusive'` → improve the test before patching. Do not
  patch from inconclusive proof.
- `trustScore < 50` → the run is real but the proof contract is weak (no
  explicit assertion, weak fidelity). **The bug is still real if
  `runtimeEvidence` shows failures.** See "Reading proof correctly" below.

**5. After patching, run Fix Check (NOT a fresh scan)** — verifies the
same test against the same assertion:

```bash
curl -X POST https://testorax.com/api/runs/<priorRunId>/fix-check \
  -H "X-Api-Key: <KEY>"
```

Only `verdict='fixed_verified'` counts as a real fix. Anything else means
the patch didn't land.

**That's it.** The rest of this document is reference for specific
features. Come back when you need authenticated discovery, mobile testing,
campaigns, scenario templates, or fix-check loops.

## Reading proof correctly (do not skip)

This is the single biggest mistake first-time agents make. Read it once.

**Low `trustScore` does NOT mean "no bug to fix."** It means the proof
contract scored low — usually because the scenario had no explicit
assertion or used weak interaction fidelity. The `runtimeEvidence` is
still real and still actionable.

**Concrete example.** A Fast Bug Scan returns:

- `trustScore: 42` (below the 50 threshold)
- `aiFixPrompt: null`
- `runtimeEvidence.network.failedRequests`: 6 entries, all
  `GET /api/auth/me → 401` on a public landing page
- `runtimeEvidence.console.errors`: 6 entries, same root cause

**Wrong conclusion**: "trustScore < 50, aiFixPrompt is null, nothing to
fix, recommend Login Memory."

**Right conclusion**: "6 × 401 from `/api/auth/me` on a page that's
supposed to be public is a real bug. Investigate why that endpoint is
being called on a public page and why it's returning 401 instead of
`{authenticated:false}`."

`aiFixPrompt: null` means the deterministic prompt generator declined to
write a one-shot patch prompt because the run wasn't strong enough proof
on its own. It does NOT mean the engine didn't observe a failure.
**Always read `runtimeEvidence` directly before dismissing a run.**

**`redacted: true` in evidence is a guarantee, not a limitation.**
Testorax never stores response bodies, request bodies, cookie values,
auth headers, or session tokens. The redaction marker means "we saw it,
we didn't store it." You still get url + method + status + duration —
that's enough to act on for HTTP failures.

## When NOT to use Login Memory

Login Memory is for tests that need to be signed in to reach the routes
under test. It is the wrong fix when:

- The failure is on a **public page** (landing, pricing, signup, login
  itself). If the page should be reachable without auth and an auth-shaped
  call fails, fix the caller — don't paper over it with a fake session.
- The failure is a **leftover endpoint** from a previous auth provider
  (Clerk → Lucia migration, Auth0 → custom, etc). Find the caller and
  decide whether the endpoint should exist at all.
- The endpoint should return a clean `{authenticated:false}` 200 for
  unauthenticated visitors but is returning 401. That's a server contract
  bug, not a Testorax problem.

Decision rule: if the route is supposed to be public, Login Memory is
NEVER the answer. Investigate the caller.

## Discovery

Single-fetch entry points for agents discovering Testorax for the first time:

| URL | Purpose |
|---|---|
| `https://testorax.com/agents.md` | This file. Long-form agent guide. |
| `https://testorax.com/llms.txt` | Same content, plain-text format. |
| `https://testorax.com/sitemap.xml` | Crawler / agent URL inventory. |
| `https://testorax.com/robots.txt` | Crawler policy. Public agent JSON endpoints are explicitly allowed. |
| `https://testorax.com/api/agents/index.json` | Consolidated machine-readable agent surface index (versions, MCP+CLI npm packages, public endpoints, doc links). Public, no auth, ~3 KB. |
| `https://testorax.com/api/docs/index.json` | Structured documentation index (30 entries; per-slug fetch via `/api/docs/<slug>.json`). |
| `https://testorax.com/docs` | Human-readable docs hub. |

Public agent JSON endpoints (no authentication required, agent-readable):

| URL | Purpose |
|---|---|
| `GET /api/pricing` | Run-credit pricing model. |
| `GET /api/templates` | Workflow / CRUD test templates with input shapes + safety notes. |
| `GET /api/scenario-templates` | 45 scenario templates. |
| `GET /api/browser-capabilities` | Capability matrix declaring what is and is not supported. |
| `GET /api/docs/index.json` | All documentation slugs with summaries + tags. |
| `GET /api/docs/<slug>.json` | Per-slug structured doc fetch. |

Authenticated agent endpoints (API key required, customer-scoped):

| URL | Auth | Purpose |
|---|---|---|
| `GET /api/runs/:id/proof-pack.json` | API key OR public-by-runId | Agent-readable Proof Packet (~12 KB compact, 1 MB cap). |
| `POST /api/runs/:id/fix-check` | API key (`runs:start` scope) | Trigger same-test verification (1 run credit). |
| `GET /api/runs/:id/fix-check/result` | API key (`runs:read`) OR admin | Verdict envelope. |
| `POST /api/runs/start` + `/api/runs/bypass` | API key (`runs:start`) | Start a run. |

Agent integrations:

- **MCP server (npm)**: `testorax-mcp` — install via `npm install -g testorax-mcp`. Stdio transport. Set `TESTORAX_EMAIL` + `TESTORAX_API_KEY` env vars.
- **CLI (npm)**: `testorax` — install via `npm install -g testorax`. Same surface as MCP.
- **REST API**: documented at `/api-docs` and `/integrations`.

What Testorax publishes for discovery:

- **OpenAPI 3.1.0** at `/openapi.json` (minimal spec covering core endpoints — pricing, agents discovery, runs/start, report.json, proof-pack.json, compact-proof, fix-check).
- **MCP discovery** at `/.well-known/mcp.json` (302 redirect to the canonical server card at `/.well-known/mcp/server-card.json`).
- **Stable public demo run** at `/api/runs/free-public-demo` returning a JSON pointer to the canonical demo runId. Use this when your environment cannot run terminal commands.
- **Hosted MCP** at `POST /mcp` (streamable-http, JSON-RPC 2.0, requires `X-Api-Key`).

What Testorax does NOT publish:

- No ChatGPT plugin manifest (`/.well-known/ai-plugin.json` returns 404). Testorax does not implement the ChatGPT plugin protocol.
- No agent-facing OAuth. Authentication is API key (`X-Api-Key` header) for hosted MCP / REST; `TESTORAX_EMAIL` + `TESTORAX_API_KEY` env vars for stdio MCP.
- No PWA manifest.

`/.well-known/security.txt` is shipped — see that file for the security contact policy.

## Agent Fix Loop

Testorax tests the app, creates a Proof Packet, the coding agent reads it,
patches the app, then Fix Check verifies the same issue against the same
assertion. The agent only claims "fixed" when Fix Check returns
`verdict=fixed_verified`.

### 5-step workflow

1. Start a Vibe Test (Fast Bug Scan, Workflow Test, or Deep CRUD E2E).
2. Fetch the Proof Packet — MCP `fetch_proof_packet(runId)`, CLI
   `testorax proof <runId>`, or `GET /api/runs/<runId>/proof-pack.json`.
3. Patch only what proof supports. If `aiFixPrompt` is null, do NOT modify
   production code from this run; read `aiFixPromptBranch` for the reason.
4. Trigger Fix Check: `POST /api/runs/<runId>/fix-check` (1 run credit).
5. Read `GET /api/runs/<verifyRunId>/fix-check/result` and accept only
   `verdict=fixed_verified`. Anything else means the fix is not real yet.

## Canonical endpoints

| Endpoint | Purpose |
|---|---|
| `GET /api/runs/:id/proof-pack.json` | Agent-readable Proof Packet (1 MB cap; ~12 KB compact). |
| `POST /api/runs/:id/fix-check` | Trigger same-test verification (1 run credit; idempotent on in-flight). |
| `GET /api/runs/:id/fix-check/result` | Agent-facing verdict envelope. |
| `GET /api/runs/:id/report.json` | Full structured report (preserved). |
| `GET /api/runs/:id/verify-fixes/results` | Multi-target Verify Fixes ledger (legacy, preserved). |

## MCP tools currently available

- `fetch_proof_packet(runId, mode?)` — canonical Proof Packet reader.
- `fetch_proof_pack(runId)` — alias of `fetch_proof_packet` (legacy name).
- `start_fast_bug_scan` / `start_regression_check` / `start_workflow_test`
  / `prepare_crud_e2e_config` — start tools.
- `list_runs` / `run_status` / `run_timeline` — read tools.
- `verify_fixes_results(runId)` — Verify Fixes ledger reader.

**Deferred:** `start_fix_check` and `get_fix_check_result` MCP wrappers
are on the roadmap. Use the REST endpoints directly until then.

## CLI commands currently available

```
testorax proof <runId>             # fetch the Proof Packet (concise summary)
testorax proof <runId> --json      # full Proof Packet JSON to stdout
testorax fetch-proof <runId>       # alias of `proof`
testorax fix <runId>               # print AI Fix Prompt or branch-specific cautious null message
testorax scan <url>                # start a Fast Bug Scan
testorax regression <priorRunId>   # Verify Fixes (legacy; Fix Check is REST-only this batch)
testorax splash                    # show the CLI splash screen (alias: demo, welcome)
```

The splash command (`testorax splash` / `demo` / `welcome`) prints a
black/yellow ASCII trust banner. Pure local rendering — no network call,
no D1 read/write, no run quota burn. A static screenshot of the splash
is also published at `/images/cli-splash.png` for embedding in READMEs
and devrel content.

**Deferred:** `testorax fix-check <runId>` and
`testorax fix-check-result <verifyRunId>` are deferred. Use curl with the
saved API key:
```
curl -X POST -H "X-Api-Key: $(jq -r .apiKey ~/.testorax/config.json)" \
  https://testorax.com/api/runs/<runId>/fix-check
curl https://testorax.com/api/runs/<verifyRunId>/fix-check/result
```

## Fix Check verdicts

| Verdict | Meaning | Treat as |
|---|---|---|
| `fixed_verified` | Every original target passed; same-assertion lock satisfied; proof intact. | OK to claim fixed. |
| `still_failing` | At least one original target still failed in the verify run. | Do not claim fixed. |
| `cannot_verify_no_lock` | Original assertion missing or unparseable. | Do not claim fixed without an assertion lock. |
| `scope_shrunk` | Failing scenario missing from the verify run. | Restore the failing scenario before claiming fixed. |
| `proof_disappeared` | Targets passed but the original proof surface is gone. | Treat as not-fixed; do not remove proof evidence. |
| `unable_to_rerun` | Verify run had nothing to verify. | Re-run with a fresh failed run. |
| `inconclusive` | Mixed historical / unable rows. | Do not modify production code from this verify run. |
| `pending` | Verify run still queued or running. | Wait. |
| `assertion_weakened` | Reserved enum slot. | Not currently emitted (deferred). |

## Do-not-claim rules

- Do not claim fixed before `Fix Check verdict=fixed_verified`.
- Do not modify production code when `aiFixPrompt` is null.
- Do not weaken assertions or change selectors.
- Do not delete the failing scenario (detected as `scope_shrunk`).
- Do not blame the app on test-design or provider-environment branches.
- Do not claim untested flows or pages are clean from a passing run.

## Safe workflow (copy-paste)

```
You are a coding agent fixing a Testorax-detected issue.

1. Fetch the Proof Packet for run <RUN_ID> via MCP fetch_proof_packet or
   `testorax proof <RUN_ID> --json`.
2. Read whatIsProven, whatIsNotProven, and doNotClaim before patching.
3. If aiFixPrompt is null, STOP. Explain why instead of patching.
4. If aiFixPrompt is present, apply the minimal change it describes.
   Do NOT weaken assertions, change selectors, or delete the failing scenario.
5. Trigger Fix Check: POST /api/runs/<RUN_ID>/fix-check.
6. Read GET /api/runs/<VERIFY_RUN_ID>/fix-check/result.
7. Only claim "fixed" when verdict='fixed_verified'.
```

## Structured agent docs (JSON)

- Index: https://testorax.com/api/docs/index.json
- Agent Fix Loop entry: https://testorax.com/api/docs/agent-fix-loop.json
- Full walkthrough: https://testorax.com/docs/agent-fix-loop

## Login Memory (authenticated apps)

Authenticated dashboards need a saved login. There are now TWO paths:

1. **Customer-driven (dashboard)** — customer signs in at `/account/login-memory` and pastes a Playwright `storageState` (existing flow, unchanged).
2. **Agent-driven (CLI / MCP / API)** — the agent creates a Login Memory profile programmatically. Testorax drives a real browser login on the cloud-side Hetzner runner against a sandbox/staging URL, captures the resulting session encrypted at rest, and returns the `authSessionId`. The password is consumed once and destroyed.

The agent then references the saved login by id only. The agent never sees cookies, headers, or session values.

### Agent flow — create + use

```bash
# 1. Create the profile (password from stdin — never argv, never env var)
op read "op://Private/sandbox/password" | testorax safe-login create --password-stdin \
  --login-url https://app-sandbox.example.com/login \
  --label "Sandbox dashboard" \
  --username alice@example.com \
  --json

# Response: { authSessionId: "login_xxxxxxxxxxxxxxxxxxxxxx", profileStatus: "pending_capture", ... }

# 2. Poll until active
testorax safe-login get login_xxxxxxxxxxxxxxxxxxxxxx

# 3. Run authenticated_smoke with the new profile
testorax authenticated-smoke --auth-session login_xxxxxxxxxxxxxxxxxxxxxx \
  --routes /admin --routes /admin/users

# 4. Fetch proof
testorax proof <runId>

# 5. Revoke when done
testorax safe-login revoke login_xxxxxxxxxxxxxxxxxxxxxx
```

### Agent rules (do-not list)

- Do not pass passwords on the command line (`--password` is rejected; only `--password-stdin`).
- Do not put passwords in environment variables (process tree leaks via `/proc/<pid>/environ`).
- Do not use `echo "secret" | ...` if shell history is enabled — pipe from a file or secret manager.
- Do not ask the user to paste cookies, headers, or session JSON into chat.
- Do not echo, print, or log the password.
- Do not use `authSessionId` from someone else's account.
- Do not target production-looking hosts. Sandbox/staging/test/local hosts only — operator-approved override via `auth.sandboxAllowedDomains`.

### Approved sandbox hosts (Batch 1)

`*-sandbox.*`, `*-staging.*`, `*.local`, `*.dev`, `*.test`, `localhost`, `127.0.0.1`, `example.com` (and subdomains), `herokuapp.com` (canary). Anything else → 400 `unsafe_production_login_target`.

### MCP tools

- `list_auth_sessions` — metadata only.
- `get_auth_session` — metadata only.
- `revoke_auth_session` — permanent revoke + delete.
- `create_auth_profile` — drive a real browser login + capture session (auth_sessions:write).

There is no decrypt tool. There is no export tool. Saved-session contents never leave the test-run engine. `create_auth_profile` returns `{authSessionId, runId, profileStatus, label, origin, expiresAt}` only — never the password, cookies, headers, storageState, ciphertext, or capture token.

### Reserved: passwordRef

`credentials.passwordRef` (`fixture://...`) is reserved for a future operator-bound resolver. In Batch 1 the runner refuses with `passwordRef_resolver_not_implemented`. Use `--password-stdin` / the MCP `password` field instead.

### Scopes

- `auth_sessions:read` for list / get.
- `auth_sessions:write` for create / revoke / create_auth_profile.
- Neither scope is in `AGENT_RECOMMENDED_SCOPES`.
- `proof:read` does NOT grant Login Memory access.

### Full walkthrough

https://testorax.com/docs/login-memory

## Chrome Parity backends (cascade)

Chrome parity compares Testorax runner truth against real-browser truth. Multiple backends in priority order; first reachable wins:

1. `login_memory` — saved Login Memory profile
2. `cdp_attach` — ATTACH to your already-running Chrome via CDP (LOCAL-ONLY, http://127.0.0.1:9222 / localhost / [::1] only)
3. `hosted_chrome` — Testorax-hosted Chromium (partial; default-engine path)
4. `chrome_extension` — public Chrome Web Store extension v0.2.0
5. `claude_chrome_mcp` — pre-existing fallback (often disconnected)

### Local CDP attach (when local Claude-in-Chrome MCP is offline)

```
# Start Chrome with CDP:
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\temp\testorax-cdp-profile"
# macOS:    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/testorax-cdp-profile
# Linux:    google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/testorax-cdp-profile

# Open the target URL in that Chrome window, then:
testorax chrome-parity check https://your-app.example.com --cdp http://127.0.0.1:9222 --goal route_presence --json

# Compare against a specific run (selector_presence is the most useful goal for "agent said selector_missing"):
testorax chrome-parity check https://your-app.example.com --cdp http://127.0.0.1:9222 \
  --goal selector_presence --selector "[data-testid=save]" --run-id <runId> --json

# For authenticated dashboards: log in manually in your CDP Chrome window first, then:
testorax chrome-parity check https://app.example.com/dashboard --cdp http://127.0.0.1:9222 \
  --goal auth_reachability --expect-auth --run-id <runId> --json
```

Returns `{backendUsed, connected, chromeUrl, runnerUrl, targetTitle, targetIdHash, pageTargetCount, verdict, proofScope, evidence, backendAttempts[], comparison: {goal, verdict, confidence, reason, matchedSignals[], mismatchedSignals[], missingSignals[], doNotClaim[]}, nextCommand}`.

### Comparison verdicts (closed set)

- `chrome_confirmed` — Chrome supports the runner claim (route, selector, or auth).
- `runner_mismatch` — Chrome and runner disagree on the target.
- `chrome_only_pass` — selector exists in Chrome but runner reported it missing → likely stale deploy / auth state, NOT an app bug.
- `runner_only_failure` — runner reported failure but Chrome doesn't reproduce → check runner side first.
- `selector_mismatch` — both runner and Chrome agree the selector is missing.
- `auth_session_mismatch` — Chrome shows a login wall while runner expected authenticated route.
- `app_bug_confirmed` — runner failed AND Chrome shows the same console errors → likely a real app bug (verify error details before patching).
- `inconclusive` — evidence insufficient. NOT a green light to patch.
- `blocked` — CDP couldn't connect or no target matched.

Every non-`chrome_confirmed` verdict carries a `doNotClaim` list.

**Safety:**
- CDP URL must be localhost (127.0.0.1 / localhost / [::1]) with explicit port.
- Public hosts, LAN IPs, https:// URLs, embedded credentials, query/fragment all refused.
- Observation only — no clicks, fills, or submits.
- Output NEVER includes cookies, headers, tokens, storageState.

**When to use:** local Claude-in-Chrome MCP disconnected; you need deterministic browser-grade proof from your own Chrome; CI agent on the same box as Chrome.

**When NOT to use:** target requires `chrome_confirmed` verdict (Batch 1 is observation-only — issues `inconclusive` with honest reason). Use AB-5G saved Login Memory profile + Chrome Extension v0.2.0 for full parity comparison.

### MCP tool

- Stdio MCP: `check_chrome_parity_cdp({ targetUrl, cdpUrl })`. Returns the same backend-result envelope.
- Hosted MCP does **not** register this tool — it can't reach the user's localhost.

## Authenticated Smoke Test (admin/dashboard mode)

For dashboards, admin panels, or any logged-in surface that public Fast Bug Scan can't reach.

**One command:**

```bash
npx testorax auth-smoke https://dashboard.example.com \
  --session login_xxxxxxxxxxxxxxxxxxxxxx \
  --route /admin --route /admin/users --route /admin/settings \
  --json --no-open
```

**What it does:**
- Loads a saved Login Memory profile (read-only, encrypted at rest).
- Visits each `--route` you list (paths or full URLs).
- Captures per-route HTTP status, console errors, failed network requests, blank-content detection, document title, screenshot.
- Classifies each route: `ok / blank_content / page_error / fetch_failed / auth_lost / timeout / navigation_error / console_error_only`.
- Stops on first `auth_lost` (saves credit when the profile expired).
- **NEVER clicks anything. NEVER submits forms. NEVER prints raw cookies/passwords.**

**Pricing:** 1 run credit ($1.99 PAYG) per audit, regardless of route count up to 25.

**Prerequisite:** the user creates a Login Memory profile at `https://testorax.com/account/login-memory`. The agent passes the resulting `login_<22>` id as `--session`. The agent should NEVER ask for the user's password, cookies, or storageState directly.

### MCP

Tool: `start_authenticated_smoke`. Args: `{authSessionId, routes[], baseUrl?, viewport?}`.

### REST

`POST /api/runs/start` with `{mode: "authenticated_smoke", authSessionId, routes[], baseUrl?}`.

Refusal codes (closed set): `invalid_authSessionId / no_routes / too_many_routes / invalid_route / auth_session_not_owned_or_not_found / payment_required`.

### Agent rules

- Do not ask the user to paste a password — direct them to `/account/login-memory`.
- `auth_lost_early` means the user's session expired. Tell them to refresh the Login Memory profile, NOT that the app is broken.
- `console_error_only` is often non-fatal. Don't claim a bug unless evidence is conclusive.
- If `some_routes_failed`, fetch the proof-pack BEFORE proposing a code change.

### Full walkthrough

https://testorax.com/docs/authenticated-smoke

## Watch Test Live (live observation)

Watch a run as it executes — current status, current step, recent proof events, latest screenshot. Live observation, **not** video replay.

### Endpoint

`GET /api/runs/:id/live` returns a compact JSON envelope. **Authenticated only — no public-by-runId.** Accepted credentials: `X-Admin-Key`, dashboard session cookie, or `X-Api-Key` with `proof:read` scope. Non-admin scoped keys are cross-tenant guarded. Missing/invalid creds → `401 unauthorized`; stranger key on another customer's runId → `403 forbidden`. Stronger than the public Proof Packet because the envelope can include the current URL and step of an in-progress run.

### MCP / CLI

- MCP: `get_live_run_status({ runId })` (hosted + stdio).
- CLI: `testorax run-live <runId>` / `testorax run-live <runId> --watch` / `testorax run-live <runId> --json`.

### Agent rules

- Do not claim "fixed" from a passing live status. Only Fix Check verdict `fixed_verified` counts.
- Do not claim coverage from event counts. Read `whatIsProven` in the Proof Packet.
- Do not patch production code from in-flight events. Wait for `isFinished: true`.
- `latestScreenshot.url` is auth-proxied. Never expect a raw R2 public link.

### What this surface never exposes

Cookies, headers, tokens, passwords, saved-session contents, encryption material, provider IDs, billing fields. Same redaction surface as Proof Packet.

Full walkthrough: https://testorax.com/docs/watch-test-live

## Vibe Browser Engine (capability matrix)

Testorax declares the deterministic list of browser engines and capabilities at `GET /api/browser-capabilities` (public, no auth).

### Default engine

`chrome_managed` — Hetzner-hosted Chromium running Playwright. Used for every accepted Vibe Test today.

### Capabilities you can rely on today

screenshots, cookies + storage state, Login Memory injection, request timeline, console error capture, visual proof, mobile emulation, live view polling.

### Capabilities NOT available

- `ios_safari` — false (planned, not implemented)
- `extension_testing` — false
- `video_replay` — false (permanently scoped out unless an approved paid-media batch ships first)
- `cloud_provider_activated` — false (no Browserbase / Browserless / Kernel)
- `cdp_attach` — false
- `file_upload` / `file_download` — false (not yet wired in scenario runner)

### Agent rules

- Before claiming "Testorax does X", check the matrix. If the capability is `false`, do not claim it.
- Engine selection on run start: pass `engine` and/or `capabilitiesRequired`. Unsupported request → HTTP 400 `browser_capability_unsupported` with `capabilitiesMissing` list. No run created. No credit deducted. No silent downgrade.
- MCP: `get_browser_capabilities()`. CLI: `testorax browser-capabilities`.

Full walkthrough: https://testorax.com/docs/vibe-browser-engine

## Authenticated scenarios (proof-flow hardening)

Run authenticated dashboard tests on sandbox/staging hosts. Two paths:

- **Path A**: log in inside the scenario via `navigate` → `fill` → `click` → `wait_for_response` → `clickByTestId`. Use the **authenticated-dashboard-flow** template.
- **Path B**: inject a sandbox session cookie via `setCookie`. Requires `auth.allowSessionInjection: true` AND a sandbox allowlist.

### Refusal before billing

`setCookie` / `clearSession` on a production-looking host → HTTP 400 `auth_scenario_action_unsafe` with structured code (`unsafe_production_domain`, `session_injection_not_permitted`, etc). **No run created. No credit deducted.**

### Cookie value safety

Cookie values are masked everywhere — report JSON, Proof Packet, export bundle, CLI/MCP, logs. Never echoed, never returned through any read path.

### Stable selectors

Use `clickByTestId` / `fillByTestId` instead of CSS where a `data-testid` exists. Report carries `selectorStrategy="data-testid"`, `testId`, `resolvedSelector`, found/visible/enabled, near-miss candidates.

### Proof bundle

`GET /api/runs/:id/export.json` returns a single JSON envelope: metadata + proof.md + steps.json + console.json + network.json + screenshots[] (auth-proxied URLs) + proof-packet.json + redaction stats. Same auth model as Proof Packet. 1 MB cap. ZIP deferred.

### Agent rules

- Do not paste real cookie values into chat. Reference a `credentialsRef` or use Login Memory.
- Do not use production-host cookies.
- Do not claim "fixed" from a passing run alone — run Fix Check; only `fixed_verified` counts.
- Do not claim coverage on untested authenticated routes.

Full walkthrough: https://testorax.com/docs/authenticated-scenarios

## Run History Intelligence (Proof Memory)

Deterministic read-only classifier over Testorax run history. Surfaces recurring issues, weak passes, false-pass risks, stale deploys, selector failures, and proof-quality patterns. **No LLM calls. No mutation of historical data.**

### Endpoints

- `GET /api/runs/:id/intelligence` — single run.
- `GET /api/account/run-intelligence?domain=&days=` — caller history (default 30 days, max 90; cap 200 runs analyzed).

Authenticated only — `X-Admin-Key`, dashboard session cookie, or `X-Api-Key` with `proof:read` scope. Cross-tenant guarded by email match.

### Classifications

`product_bug`, `scenario_bug`, `selector_missing`, `assertion_missing`, `weak_pass`, `false_pass_risk`, `stale_deploy_suspected`, `auth_issue`, `network_issue`, `console_error_issue`, `missing_testid`, `build_marker_missing`, `proof_weakened`, `inconclusive_weak_proof`, `provider_bug`, `clean_with_sufficient_proof`, `clean_but_low_proof`.

### MCP / CLI

- MCP: `get_run_intelligence({runId})` / `get_history_intelligence({domain?, days?})`.
- CLI: `testorax run-intelligence <runId>` / `testorax history-intelligence [--domain <host>] [--days N]`.

### Agent rules

- `weak_pass` is NOT a proven product bug. It means the run did not assert anything.
- `stale_deploy_suspected` is a diagnostic, not a verdict. Verify the deploy is fresh before acting.
- High `weakPassRisk` does not mean the app is broken. It means the proof was thin.
- Cluster signatures are hints, not proofs.
- Use Fix Check to validate a fix. Run-intelligence is observation only.

Full walkthrough: https://testorax.com/docs/run-history-intelligence

## Compact Proof Summary (Agent Compact Mode)

Small (~1–3 KB) agent-first JSON summary of a run. Read this **before** downloading the full report or proof packet — it tells you the verdict, failure type, failed step, trust warning, and what to fetch next.

### Endpoint

- `GET /api/runs/:id/compact-proof` — authenticated only. Same auth model as `/api/runs/:id/intelligence`: `X-Admin-Key`, dashboard session cookie, or `X-Api-Key` with `proof:read` scope. Cross-tenant guarded.

### Recommended agent loop

1. Fetch compact proof.
2. If `verdict=fail` and the reason is clear, patch the exact issue.
3. If UI / selector / auth / visual uncertainty exists, fetch `latestScreenshot.url` (or `markedUrl`).
4. If evidence is weak, fetch `proofPacketUrl`.
5. After patching, run Fix Check.
6. Do not claim "fixed" until Fix Check returns `verdict=fixed_verified`.

### MCP / CLI

- MCP: `get_compact_proof({runId})` — registered in both hosted and stdio MCPs.
- CLI: `testorax compact-proof <runId>` (alias `testorax proof-summary <runId>`). Flags: `--json`, `--with-links`, `--no-fix-prompt`.

### Agent rules

- Compact proof is **not** a replacement for a full audit bundle.
- Do not claim "fixed" until Fix Check passes.
- Do not weaken assertions or delete the failing scenario.
- Do not patch production code from inconclusive proof.
- A passing compact proof only covers the tested scenario; do not generalise.

Full walkthrough: https://testorax.com/docs/compact-proof

## Agent Visual Proof Loop (Screenshot-on-Demand)

Compact proof now carries a `screenshotRecommendation` block telling you WHEN to fetch a screenshot, WHICH one (latest vs marked), and CAPS the loop so screenshot fetches never spam. **No video replay. No session recording. No browser takeover.** Stage A is metadata-backed; real overlay PNG rendering is deferred to Stage B.

### Loop

1. Fetch compact proof.
2. Read `screenshotRecommendation.recommended`.
3. If `reason` ∈ {`selector_missing`, `testid_missing`, `page_mismatch`}, fetch the marked screenshot.
4. For `auth_state_unclear` / `stale_deploy_suspected` / `visual_issue`, fetch latest.
5. Stop after `budget.maxScreenshotsForAgentLoop` (default 2).
6. If `noScreenshotAvailable` is true, do not retry — rerun with screenshot capture if visual evidence is needed.

### Endpoints

- `GET /api/runs/:id/screenshots/latest` — JSON pointer (auth-proxied PNG URL).
- `GET /api/runs/:id/screenshots/latest/marked` — annotated metadata.
- `GET /api/runs/:id/screenshots/:nameWithExt/image` — `image/png` bytes (auth-proxied).

### MCP / CLI

- MCP: `get_compact_proof({runId})` (includes recommendation), `get_screenshot_recommendation({runId})`, `get_latest_screenshot({runId})`, `get_marked_screenshot({runId, screenshotId})`.
- CLI: `testorax screenshot-latest <runId>`, `testorax screenshot-marked <runId>`, `testorax compact-proof <runId>`.

### Agent rules

- Do not fetch screenshots every step — the budget exists for a reason.
- Do not claim "screenshot proof" unless the screenshot was actually fetched and viewed.
- Do not assume a screenshot exists — check `noScreenshotAvailable`.
- Pure network/console failures: skip the screenshot, inspect `export.json` instead.
- No video / no session replay / no live viewport stream is supported.

Full walkthrough: https://testorax.com/docs/agent-visual-proof-loop

## Inline screenshot rendering for Claude Code Desktop

`get_latest_screenshot` and `get_marked_screenshot` accept `returnImage: true`. When set, the tool emits an MCP image content block (base64 PNG + `mimeType: image/png`) alongside the text metadata. Claude Code Desktop renders the image inline. Default is metadata-only to save tokens.

### Recommended visual loop

1. Call `get_compact_proof` — read `screenshotRecommendation.recommended`.
2. If true and reason is selector/testid/page mismatch, call `get_marked_screenshot({ runId, returnImage: true })`.
3. Otherwise call `get_latest_screenshot({ runId, returnImage: true })`.
4. If the image does not render inline (older client or transport limit), use the CLI fallback: `testorax screenshot-latest <runId> --open` saves under `.testorax/screenshots/` and opens it.
5. Do not claim visual proof unless the screenshot was actually fetched or saved.
6. Do not exceed `screenshotRecommendation.budget.maxScreenshotsForAgentLoop` (default 2).

### CLI local-file fallback

- `testorax screenshot-latest <runId> --output <file.png>` — explicit save path.
- `testorax screenshot-latest <runId> --open` — saves to `.testorax/screenshots/<runId>-latest.png` and opens it.
- `testorax screenshot-marked <runId> --output <file.png>` — saves the underlying PNG bytes (Stage A.B: real overlay PNG when annotated metadata exists; falls back to latest unmarked PNG otherwise).
- `testorax screenshot-marked <runId> --open` — saves to `.testorax/screenshots/<runId>-marked.png` and opens it.

CLI output always says `image saved to: <absolute path>` and `open this file to inspect the screenshot`. Never prints raw R2 URLs.

## Preflight / Build Freshness Guard

Scenario-level preflight catches **stale-deploy / wrong-route / unauthenticated / missing-testid** problems BEFORE deep scenario steps run. Saves credits, saves agent tokens, and prevents you from debugging app code when the real failure is a stale build.

### Sample

```
{
  "preflight": {
    "enabled": true,
    "failFast": true,
    "expectedUrlIncludes": "/editor",
    "requiredTestIds": ["editor-title-input", "editor-save-draft", "editor-publish"],
    "expectedBuildMarker": "ORD_SANDBOX_BUILD_2026-05-04",
    "requireAuthenticated": true,
    "authCheck": { "type": "url_not_login", "loginPathIncludes": "/login" },
    "criticalApiChecks": [{ "method": "GET", "path": "/api/auth/me", "expectedStatus": 200 }],
    "failOnConsoleErrors": true,
    "maxSeconds": 15
  }
}
```

### Verdicts

`preflight_passed`, `preflight_failed`, `stale_deploy_suspected`, `required_testid_missing`, `build_marker_missing`, `asset_hash_mismatch`, `auth_issue`, `route_mismatch`, `critical_api_failed`, `critical_console_error`, `blank_page`, `inconclusive_preflight`.

### Agent rules

- A preflight failure is **not** automatically an app bug.
- `stale_deploy_suspected` means: redeploy cleanly + verify build BEFORE changing app logic.
- Do not patch business logic until preflight passes.
- Do not mark a run "passed" if preflight failed.
- Fetch compact proof first; fetch a screenshot only when the recommendation flag says so.

Refusal: invalid preflight configs return HTTP 400 `preflight_config_unsafe` BEFORE billing — no run created, 0 credit deducted.

Full walkthrough: https://testorax.com/docs/preflight-build-freshness

## Scenario Template Library

41 ready-made, agent-first scenario templates. Use these BEFORE writing custom scenarios by hand.

### Endpoints

- `GET /api/scenario-templates` — list (optional `?appFamily=` and `?category=`).
- `GET /api/scenario-templates/:id` — full detail with sample steps.
- `POST /api/scenario-templates/:id/generate` — generates a fillable scenario JSON. Body: `{"variables": {...}}`. Validates required inputs, refuses forbidden secret-shape values (auth headers / session cookie shapes / vendor secret keys / cloud access keys / JWT-shaped strings / literal password), refuses no-final-assertion / evaluate-only, blocks DELETE on non-sandbox hosts. 400 on refusal — no run created.

### Recommended loop

1. `list_scenario_templates({ appFamily, category })`
2. `get_scenario_template({ id })`
3. `generate_scenario_from_template({ id, variables })`
4. Submit the returned scenario via `/api/runs/bypass` with `executeOnly: true` (admin/sandbox).
5. Fetch `/api/runs/:id/compact-proof` — read `screenshotRecommendation`.
6. After patching, run Fix Check; only `verdict=fixed_verified` counts.

### MCP / CLI

- MCP: `list_scenario_templates`, `get_scenario_template`, `generate_scenario_from_template` (hosted + stdio).
- CLI: `testorax scenarios list|show|generate`.

### Agent rules

- Do NOT remove assertions to make tests pass — Fix Check checks `assertionsHash`.
- Do NOT paste real cookies / passwords / auth tokens into variables.
- Do NOT skip preflight; do NOT patch business logic until preflight passes.
- Fetch compact proof first; fetch screenshots only when recommended.

Full walkthrough: https://testorax.com/docs/scenario-template-library

Browsable UI gallery (filterable by app family / category / Fix Check
compatibility): https://testorax.com/scenario-templates — filter, preview,
fill required inputs, generate scenario JSON, and start a Flow Test
in-page using the standard `POST /api/runs/start` path (mode=workflow_test,
customer X-Api-Key with `runs:start` scope). The launcher applies a
client-side precheck (assertion presence + preflight presence + valid
email + API key shape) before submitting; the server still enforces the
authoritative validation (run-submission schema, **scenario lint applied
AFTER workflow_test wrapping** so no-final-assertion / destructive-on-
production scenarios are refused before run creation, preflight contract,
plan eligibility, run-credit balance, rate limit, concurrency cap).
Refused launches deduct 0 run credits.

`POST /api/runs/start` honors the `Idempotency-Key` header (server safety
correction): same key + same canonicalized body + same email → 200 with
`idempotentReplay:true` and the existing runId; same key + DIFFERENT body
→ 409 `idempotency_conflict` with the prior runId surfaced. Scope is
(key + email + endpoint). The launcher derives a stable per-(template,
generated-payload, email) key automatically.

Credentialed flows (login forms / cookies / saved sessions) should use
`/account/test` or Login Memory (`/docs/login-memory`) — the gallery
launcher does not collect passwords / cookies / storageState / API keys
for the target app.

## Fix Check Same-Test Lock

Makes `fixed_verified` impossible to fake.

**Endpoint:** `POST /api/runs/:id/fix-check/lock` body `{verificationRunId, reconcilerVerdict?}`. Owner/admin only. Cross-tenant guarded. NO run created, NO credit deducted.

**12 closed verdicts:** fixed_verified / still_failing / proof_weakened / scope_changed / assertion_weakened / preflight_weakened / testids_weakened / route_changed / template_changed / proof_disappeared / inconclusive / unsafe_to_claim_fixed.

**Blocked from fixed_verified when:** templateId changed, assertionsHash diverged, required assertion removed, required preflight removed, required testId removed, target route changed, network assertion disappeared, backend / screenshot / cleanup proof was on original but gone in verification.

**Agent rules:** Do NOT remove assertions, skip preflight, swap templates, change route, or remove proof in the verification run. Any non-`fixed_verified` lock outcome means `unsafeToClaimFixed=true` — do NOT claim "fixed".

Full walkthrough: https://testorax.com/docs/fix-check-same-test-lock

## Code-Aware Fix Hints

Strictly advisory. When Testorax proves a failure, this surface generates safe read-only hints showing likely source files / symbols / routes / API handlers to inspect.

**Endpoint:** `POST /api/runs/:id/code-hints` body `{files: [{path, source}, ...]}`. Owner/admin only. Cross-tenant guarded. NO public-by-runId. NO LLM. NO auto-patch. NO raw code in output. Body capped 256 KB; per-file 200 KB. Auto-excludes node_modules / .next / dist / build / .env* / lockfiles / binaries.

**MCP:** `get_code_hints({runId, files})` (hosted + stdio). **CLI:** `testorax code-hints <runId> [--root <dir>] [--max-files N] [--json]`.

**Confidence:** `high` (exact testId or API-path match + risk-tag + runtime evidence aligns) / `medium` (approximate match) / `low` (single weak signal).

**Agent rules:** Do NOT auto-apply. Do NOT claim "root cause" from static match alone. Inspect candidates first; cross-check via Compact Proof + Run Intelligence; then patch and run Fix Check. Only `verdict=fixed_verified` counts.

Full walkthrough: https://testorax.com/docs/code-aware-fix-hints

## Run Submission Schema Contract (Fast Bug Scan auth reliability)

Before submitting an authenticated run, read the closed-set contract.

**Endpoints:** `POST /api/runs/start` (mode-aware) · `POST /api/runs/bypass` (admin-tier).

**Auth strategies:** `storage_state` · `cookies` · `login_form` · `session_injection` · `login_memory` · `steps`. Pick exactly one. Mixing fields incompatible with the declared strategy returns `400 auth_mixed_strategies`.

**Apply timing:** cookies / storageState / login_form / steps are ALL applied before the first protected goto. The engine sets `authDiagnostics.appliedBeforeFirstGoto=true` on success.

**authEvidence on report.json:** Every authenticated FBS report carries a top-level `authEvidence` block with `used / strategyAttempted / strategySucceeded / appliedBeforeFirstGoto / targetUrl / lastSeenUrl / redirectedToLogin / scopedPathsAttempted / pagesCrawled / failureReason`. Booleans + counts + strategy names only — no raw cookies, passwords, headers, storageState, or CSRF tokens.

**pagesCrawled = 0 is normal for LCA runs.** Use `scopedPathsAttempted` and `coverageSummary` for depth.

**Refusal codes:** `invalid_run_mode | url_required | invalid_url | auth_strategy_unknown | auth_login_form_required | auth_cookies_required | auth_storage_state_required | auth_unsupported_field | auth_mixed_strategies | auth_session_injection_unsafe | invalid_scoped_paths | invalid_scoped_path_entry | executeOnly_requires_scenarios | skipCrawl_requires_scenarios_or_steps | invalid_scenarios_array | invalid_template_inputs | invalid_preflight | invalid_viewport | payload_too_large`. All return 400 BEFORE billing — no run created, 0 credits deducted.

**Schema JSON:** `GET /api/docs/run-submission-schema.json` (machine-readable, public).

**Agent rules:**
- Read `authEvidence.strategySucceeded` BEFORE drawing conclusions.
- `appliedBeforeFirstGoto=true` + still-failing means the app rejected the session (config / domain / expiry) — NOT a Testorax timing bug.
- `redirectedToLogin=true` is a strong signal the auth payload was wrong / expired / wrong domain. Do NOT patch app code from this signal alone.
- NEVER paste real cookies / passwords / tokens in public docs / prompts / commits.

Full walkthrough: https://testorax.com/docs/run-submission-schema

## Report Evidence Contract (Passing Run Evidence + Console/Network Evidence)

PASS is NOT automatically strong proof. Every authenticated FBS / scenario_runner / workflow_test / full_crud_e2e report carries:

- `passingRunEvidence` — closed-shape block with `proofStrength` ∈ `{strong, moderate, weak, inconclusive}`, `stepsExecuted`, `assertionsExecuted`, `requestsObserved`, `consoleErrorsObserved`, `failedRequestsObserved`, `whatIsProven[]`, `whatIsNotProven[]`, `doNotClaim[]`.
- `stepEvidence[]` — one row per step (passed + failed + skipped) with stable `step_NNN` ids, `selectorOrTarget` (redacted), `urlBefore` / `urlAfter`, `durationMs`, `assertionResult`, `requestEvidenceIds[]`, `consoleEvidenceIds[]`, `networkEvidenceIds[]`.
- `runtimeEvidence.console` — `errors[]` / `warnings[]` / `exceptions[]` with `console_NNN` ids; `preview` is redacted + clipped at 240 chars; `countByLevel`.
- `runtimeEvidence.network` — `requests[]` / `failedRequests[]` / `redirects[]` with `net_NNN` ids; `requestBodyShape` / `responseBodyShape` show key names + types + counts only — values NEVER stored. `authCookiePresent` / `csrfPresent` / `authorizationHeaderPresent` are booleans only — never raw headers.

**proofStrength rules**: PASS without any assertion → weak. PASS + assertion + no console errors + no failed requests → strong. PASS + assertion + errors → moderate. PASS with auth issue → weak (never strong).

**Agent rules**:
- Read `passingRunEvidence.proofStrength` BEFORE drawing conclusions.
- weak / inconclusive PASS — re-run with at least one explicit assertion BEFORE patching app code.
- Do NOT claim persistence without a passing `readback_assert`.
- Do NOT claim cleanup / rollback without a passing `verify_cleanup`.
- Reference Evidence IDs (`step_001` / `net_001` / `console_001`) when making claims.

**Compact Proof integration**: `GET /api/runs/:id/compact-proof` embeds a small `passingRunEvidence` block (proofStrength, counts, doNotClaim, topProvenClaim).

**Proof Packet integration**: contract bumped 1.6.0 → 1.7.0 — adds `passingRunEvidence`, `stepEvidence[]`, `runtimeEvidence` blocks + companion evidence row.

**Schema JSON**: `GET /api/docs/report-evidence-contract.json`.

Full walkthrough: https://testorax.com/docs/report-evidence-contract

## Fast Bug Scan Reality Grounding (no hallucinated controls)

Fast Bug Scan only tests **real controls observed in the live DOM / accessibility snapshot**. Every report carries a closed-shape `coverageGrounding` block:

- `observedControls[]` — ids `ctrl_NNN`, redacted textPreview, selector candidates, testId, role, zone ∈ `{shell, module_body, unknown}`, signature (8-char stable hash). NO raw DOM.
- `groundedActionsExecuted` / `ungroundedActionsBlocked` — every action mapped to an observed-control id; selectors that don't match get blocked.
- `coverageStatus` ∈ `{real_coverage | partial_coverage | shell_only_coverage | no_real_coverage | inconclusive}`.
- `controlsClickedByZone:{shell, module_body, unknown}` + `moduleBodyCoverageStatus` ∈ `{tested, partial, untested, no_module_body_controls}` — separates dashboards "tested via sidebar only" from real coverage.
- `blockedUngroundedActions[]` — ids `blocked_NNN` with `reason ∈ {observed_control_clicked | ungrounded_action_blocked | hallucinated_control_candidate | shell_only_coverage | selector_not_observed | no_real_coverage}`.
- `doNotClaim[]` / `whatIsProven[]` / `whatIsNotProven[]` — hard rules; weak/no coverage attaches "Do not claim app behavior was tested" + "Do not classify missing controls as a product bug".

**Compact Proof embeds**: `coverageGrounding` summary (counts + status + doNotClaim).

**Proof Packet contractVersion 1.7.0 → 1.8.0** embeds the full `coverageGrounding` block + companion `kind:"other"` evidence row.

**Agent rules**:
- Read `coverageGrounding.coverageStatus` BEFORE drawing conclusions on FBS.
- shell_only_coverage / no_real_coverage → do NOT claim app behavior is tested. Re-run with explicit scopedPaths or scenarios targeting the module body.
- NEVER classify a missing invented control as a product bug. The engine refused to invent it.

Schema JSON: `GET /api/docs/coverage-grounding.json`.

Full walkthrough: https://testorax.com/docs/coverage-grounding

## Action Fidelity Contract (synthetic-click + native-input detection)

A JS `.click()` is NOT the same as a real user click. An `evaluate`-set value is NOT the same as native typing. Reports carry a closed-shape `actionFidelity` block:

- `summary` — `{realPointerActions, nativeFillActions, keyboardTypeActions, programmaticClicks, evaluateFallbacks, weakFidelityActions, ambiguousActions, weakFidelityPresent}`.
- `actions[]` — per-action rows with id `fidelity_NNN`, `fidelity` ∈ `{real_pointer | native_fill | keyboard_type | programmatic_click | evaluate_fallback | synthetic_event | ambiguous | not_applicable}`, proof booleans `{pointerProof, keyboardProof, focusProof, hoverProof}`, `eventSequence:{pointerdown, mousedown, mouseup, click, input, change, keydown, keyup}`, `weaknessReason`, `whatIsNotProven[]`.
- `weakFidelityWarnings[]` + `doNotClaim[]`.

**PassingRunEvidence integration**: when `weakFidelityPresent=true`, `proofStrength` is capped at `moderate` (assertion present) or `weak` (no assertion). Never `strong`. doNotClaim attaches "Do not claim real-user interaction fidelity was proven".

**Compact Proof embeds**: small `actionFidelity` summary block (~250 bytes).

**Proof Packet contractVersion 1.8.0 → 1.9.0** embeds the full `actions[]` array + companion `kind:"other"` evidence row.

**Agent rules**:
- Read `actionFidelity.summary.weakFidelityPresent` FIRST on PASS runs.
- Programmatic click / evaluate fallback / synthetic event / ambiguous → do NOT claim user-fidelity / accessibility / hover behaviour.
- `native_fill` does NOT prove keystroke fidelity — use `type` for that.
- NEVER claim "the user can save" from a JS `.click()` — bind a real pointer click to verify.

Schema JSON: `GET /api/docs/action-fidelity.json`.

Full walkthrough: https://testorax.com/docs/action-fidelity

## Smart Queue (Stage A — queue visibility + customer ETA)

Every successful `POST /api/runs/start` response now carries a small
`queueEta` field for customer-facing UIs:

```json
"queueEta": {
  "status": "normal" | "busy" | "saturated" | "unknown",
  "estimatedStartSeconds": 0,
  "confidence": "low" | "medium" | "high",
  "message": "Your run is queued. Estimated start: under 1 minute."
}
```

ETA failures NEVER block run creation (the field is silently omitted).

Admin/super-admin telemetry: `GET /api/admin/capacity/snapshot`
(contract version 1.0.0, requires admin auth, no customer PII). The
existing `/api/admin/capacity` feed is preserved unchanged.

Stage A is observation only — scheduler routing, run pricing,
run-credit deduction, and worker behavior are unchanged.

Full walkthrough: https://testorax.com/docs/smart-queue

## Smart Queue Stage B (admin capacity alerts)

Admin/super-admin only. Recommendation-only. Read-only.

`GET /api/admin/capacity/alerts` (contract 1.0.0) returns deterministic
alerts derived from the same data sources as the Stage A snapshot.

8 alert types (closed set): `queue_wait_busy`, `queue_wait_saturated`,
`worker_stale`, `failure_surge`, `timeout_surge`,
`deep_jobs_blocking_fast`, `underutilization`, `insufficient_data`.

Every alert carries `observedValue`, `threshold`, `sourceMetric`,
`confidence` ∈ {low,medium,high}, `recommendedAction`, and
`safeToAutomate: false`. Missing signal emits `insufficient_data`
rather than inventing a number.

No scheduler routing change. No auto-scaling. No billing change.

## Smart Queue Stage C (smart run scheduling)

Stage C separates queues by mode, defines priority rules, and adds
per-user / per-app concurrency caps. Default OFF in production.

The pure decision contract ships at version `SCHEDULER_DECISION_CONTRACT_VERSION = 1.0.0`.
4 decision kinds (closed set): `claim`, `defer`, `skip_concurrency_cap`, `no_eligible_runs`.
4 cap names: `per_user_active`, `per_app_active`, `per_account_queued`, `per_worker_heavy`.

Default fairness caps (dispatch-side, only when SMART_DISPATCH_ENABLED='true'):
- perUserActive=4, perAppActive=8, perAccountQueued=16, perWorkerHeavyCap=1.
- Admin accounts bypass per-user/per-app/per-account caps but still observe
  reserved-fast-slot + cost-tier policies.

Customer queue position: `/api/runs/start` response now carries
`queueEta.ownQueuePosition` (own-account only — never exposes other
customers' run IDs/emails/lanes/URLs).

Admin dry-run telemetry: `/api/admin/capacity/snapshot.proposedFairness`
+ 2 new alert types `fairness_cap_hit_user` / `fairness_cap_hit_app`.

No scheduler routing change is enabled in production by default. Operator
must explicitly set `SMART_DISPATCH_ENABLED='true'` on a single worker
after Stage C.0/C.1/C.2 soak window.

## Smart Queue Stage D (burst worker recommendation)

`GET /api/admin/capacity/burst-recommendation` (admin-only, contract 1.0.0).
Recommendation-only. NEVER calls Hetzner API. NEVER provisions a worker.
`safeToAutomate` is hard-coded false on every response.

4-value enum: `add_worker` / `remove_burst_worker` / `keep_current` / `insufficient_data`.

Rules:
- `add_worker` when p95 wait ≥ 300s with queued runs OR queue depth > 4× healthy workers.
- `remove_burst_worker` when slot utilization < 20% with queue empty AND sample count ≥ 10.
- `insufficient_data` when no worker heartbeat OR low traffic + low samples OR all workers stale.
- Failure/timeout surge ≥ 50% → forced `keep_current` with reason "scaling up would mask root cause."

Anti-fabrication: `notes.unmeasured` explicitly declares worker CPU / RAM / browser saturation as unmeasured.

## Smart Queue Stage E (burst recommendation history + drift)

`POST /api/admin/capacity/burst-recommendation-snapshot` (admin-only)
captures the current Stage D recommendation into the
`burst_recommendation_snapshots` table. Idempotent on 10-min window.
Writer hard-codes `safe_to_automate=0`.

The existing `*/10 * * * *` cron trigger ALSO writes one snapshot per
10-minute window automatically through the same writer helper
(`captureBurstRecommendationSnapshot`). No new cron, no Hetzner API
call, same idempotency on `window_start` PK. Manual POST + auto-cron
both go through the same code path so audit invariants are identical.

`GET /api/admin/capacity/burst-recommendation-history` (admin-only,
contract 1.0.0) returns:
- `latest` — most recent snapshot
- `last24h` — snapshot count + recommendation counts + change count + consecutive-add/remove signals
- `drift` — 5-value enum: stable / oscillating / add_pressure / remove_pressure / unknown
- `changes` — recommendation-change timeline (capped 25)
- `notes.hetznerApiCalled: false`, `notes.historyOnly: true`, `notes.recommendationOnly: true`

History/recommendation only. NO Hetzner API call. NO auto-provisioning.
NO automation buttons in the admin UI panel.

## Smart Queue Stage F (operator action ledger)

`POST /api/admin/capacity/operator-action` (admin-only) appends one row
to the `capacity_operator_actions` ledger (additive migration 0075).
Body: `{ action, snapshotWindowStart?, recommendation?, note? }`.
`action` enum (4 values): `acknowledged | declined | acted_manually | deferred`.
Invalid action → 400 `code=invalid_action`. `note` hard-capped at 400 chars
at write time. `snapshotWindowStart` (optional) must match the strict
10-min UTC space format if supplied.

`GET /api/admin/capacity/operator-actions` (admin-only, contract `1.0.0`)
returns:
- `latest` — last 50 ledger rows (7-day window)
- `last7d.countsByAction` — { acknowledged, declined, acted_manually, deferred }
- `lastActionPerRecommendation` — most recent action per snapshot window (cap 25)
- `notes.ledgerOnly: true`, `notes.hetznerApiCalled: false`,
  `notes.autoProvisioningChanged: false`, `notes.billingChanged: false`,
  `notes.schedulerRoutingChanged: false`

LEDGER ONLY. NO Hetzner API call. NO auto-provisioning. NO automation
buttons in the admin UI panel. The operator decides + acts manually
outside Testorax; this ledger captures the decision for audit.
`actor_type` is derived from auth path (admin_session | admin_key),
never from caller payload (anti-spoofing).

## Autonomous QA Campaign Orchestrator — Stage A (campaign preview)

**Read-only preview**. Given a target URL, Testorax inspects public HTML
(sitemap.xml + the root) and returns a campaign plan: route inventory,
control inventory, scenario manifest, and a **page-run cost estimate**
the customer must confirm BEFORE any run is created.

Stage A invariants (every preview):

- Does NOT mutate the target.
- Does NOT create a run row.
- Does NOT deduct credits or wallet.
- Does NOT require an API key for public targets.
- Always returns `confirmationRequiredBeforeRun: true`.

**Endpoint**: `POST https://testorax.com/api/campaigns/preview`
Body: `{ "targetUrl": "https://...", "manualSeedRoutes"?: ["/pricing", ...], "hasAuthProfile"?: false }`.
Rate-limited 30/min/IP. Body cap 16 KB. SSRF-guarded.

**MCP tool** (stdio + hosted): `campaign_preview` with the same args.
**CLI**: `testorax campaign preview <url> [--seed /x] [--has-auth] [--json]`.

**Closed-set vocab agents must respect** (do not invent values):

- `controlType` (18): `button` · `link` · `dropdown` · `dropdown_option` ·
  `tab` · `input` · `search` · `filter` · `sort` · `checkbox` · `radio` ·
  `modal_trigger` · `drawer_trigger` · `table` · `pagination` ·
  `file_upload` · `destructive_action` · `unknown`
- `safeClassification` (8): `safe_readonly` · `safe_interaction` ·
  `sandbox_mutation_candidate` · `destructive_blocked` · `payment_blocked` ·
  `email_send_blocked` · `auth_blocked` · `unknown_needs_review`
- `scenarioLayer` (9): `page_render` · `interaction` · `form_fill` ·
  `modal_drawer` · `search_filter` · `cross_module` · `safe_mutation` ·
  `security_boundary` · `blocked`
- `proofScopeLabel` (16): `route_render` · `visible_ui_click` ·
  `modal_open` · `field_visible` · `form_fill` · `search_filter` ·
  `save_persistence` · `backend_api` · `cross_module_reflection` ·
  `mutation` · `cleanup` · `partial` · `blocked` · `false_positive_likely` ·
  `runner_mismatch` · `chrome_confirmation_needed`
- `recommendedMode` (5): `quick_scan` · `deep_scan` · `launch_audit` ·
  `full_regression` · `mutation_safe_audit`
- `runnerMismatchKind` (5): `route_loaded_in_runner_not_chrome` ·
  `control_visible_in_chrome_not_runner` ·
  `runner_click_failed_chrome_click_works` ·
  `api_proof_without_visible_ui_proof` · `selector_mismatch_likely`

**Run-balance pricing rule**: one selected page-run consumes one run
credit (`runCreditRule = "one_selected_page_run_consumes_one_run_credit"`).
The preview NEVER deducts. Customer-facing cost is denominated in
**run credits**, not invented dollar amounts.

If the caller forwards an `X-Api-Key`, the preview reads their wallet
(read-only) and returns `pricingStatus` with one of:

- `authenticated_enough_runs` — `requiresPurchaseBeforeRun: false`;
  `availableActions: ["confirm_run"]`; runs-balance fields populated.
- `authenticated_insufficient_runs` — surfaces `runsShortfall` + public
  `pricingOptions[]`; `requiresPurchaseBeforeRun: true`.
- `anonymous_pricing_preview` — no key presented; surfaces public plan +
  Starter Pack options; `requiresPurchaseBeforeRun: true`.
- `pricing_unavailable` — balance lookup errored. Do NOT invent a dollar
  total. `requiresPurchaseBeforeRun: true`.

`availableActions` closed set: `confirm_run` · `upgrade_plan` ·
`buy_run_pack` · `pay_per_run` · `sign_in`.

**Forbidden**: internal cost (tokens, model cost, browser cost, worker
cost, server cost, internal weights) is NEVER returned. Customer-facing
language is always *runs*, never internal units.

**Discovery method**: bounded discovery with two engines.

- **Stage B (default)** — HTTP/static per-route discovery via Cloudflare
  Worker `fetch` + regex extraction. **No Playwright, no real browser,
  no JavaScript execution, no DOM hydration.** Caller passes
  `discoveryDepth: "static" | "http_static" | "future_browser_deep"`
  (legacy aliases `browser_limited` / `browser_deep` accepted) and
  `maxRoutesToInspect` (default 10, hard cap 25).
- **Stage C (env-flag gated)** — real-browser preview via the Testorax
  Hetzner Playwright runner. Caller passes
  `discoveryDepth: "real_browser_preview"` and `maxRoutesToInspect`
  (default 3, hard cap 10). When env `STAGE_C_BROWSER_PREVIEW_ENABLED=true`,
  the worker mints a `previewJobId` (shape `prv_<22>`) and queues a
  Hetzner job with `mode='campaign_preview_browser'`. Caller polls
  `GET /api/campaigns/preview/jobs/:id` for the per-route hydrated
  control inventory. When the env flag is off, the synchronous response
  surfaces `realBrowserFallbackReason: "feature_flag_disabled"` —
  the worker NEVER silently presents static as browser.

Per-route discovery results surface in `controlInventory.byRouteSummary[]`
with closed-set `discoveryMethod` — Stage B values: `http_route_fetch`,
`static_html_extraction`, `static_html_fallback`, `sitemap_only`,
`auth_blocked`, `error`. Stage C adds: `real_browser_preview`,
`real_browser_preview_hydration_failed`, `real_browser_preview_capped`.

`discoveryLimitations[]` is the closed-set list of what the runner did
NOT do — Stage B baseline: `no_javascript_execution` ·
`no_playwright_browser` · `spa_shell_may_hide_controls` ·
`authenticated_controls_not_visible` · `forms_not_submitted` ·
`unsafe_controls_not_clicked`. Stage C drops the first two when the
runner ran for that route, AND adds `dropdown_options_not_expanded` and
`modal_contents_not_opened` (Stage C ships with safe expansion DEFERRED
until Stage D).

**Stage C invariants** (HARD — runner-side enforced):
`consumeQuota:false`, `chargedRunCredit:0`, `submitForms:false`,
`clickDestructive:false`, `clickPayment:false`, `sendEmail:false`,
`uploadFiles:false`, `expandDropdowns:false`, `openModals:false`,
`useAuthProfile:false`. Caps: 3 routes default, 10 hard cap, 20s
per-route, 60s total, 5 screenshots, 256 KB output.

**Stage E — Live progress / SSE telemetry**: every preview job emits a
machine-readable `progress` snapshot on every state transition. Three
read paths (no API key required, no credit deducted):

- `GET /api/campaigns/preview/jobs/:id/progress` — JSON snapshot, rate-
  limited 120/min/IP. Returns `{ ok, progress: { contractVersion,
  status, progressPercent, currentRoute, currentPageIndex, totalPages,
  completedPages, remainingPages, currentPhase, currentDiscoveryMethod,
  currentEngine, startedAt, updatedAt, elapsedSeconds,
  estimatedSecondsRemaining, lastEvent, latestError, stopReason,
  routesCompleted[], routesRemaining[], isFinal,
  confirmationRequiredBeforeRun:true, chargedRunCredit:0,
  consumeQuota:false } }`. ETA is `null` when there isn't enough data —
  never fabricated.
- `GET /api/campaigns/preview/jobs/:id/events` — Server-Sent Events.
  Closed-set event types: `progress` · `route_started` ·
  `route_completed` · `job_capped` · `job_failed` · `job_complete` ·
  `heartbeat`. Stream closes on `isFinal=true` or 60s cap.
- Website live view at `/campaigns/preview/jobs/:id` — uses SSE with
  automatic poll fallback.

CLI: `testorax campaign watch <prv_xxxx> [--json] [--interval 2]`
renders a live progress bar + ETA. MCP: `campaign_progress` tool
returns the snapshot for agents (poll every 2-3s; stop once
`isFinal=true`).

Closed-set `status` (8): `queued` · `running` · `inspecting` ·
`testing` · `capped` · `failed` · `complete` · `stopped`.

**Stage C.1 — runner shipped** (Hetzner Playwright consumer):
`apps/worker/src/jobs/campaignPreviewBrowserJob.ts` reads queued
preview-job rows (`tier='campaign_preview_browser'`), validates the
HARD invariants BEFORE launching Chromium, navigates each selected
route in a real headless browser with a bounded hydration wait, and
returns a `BrowserPreviewJobResult` with hydrated control counts
(`controlsByType`, `controlsByClassification`, `controls[]` cap 50)
through the same `/api/campaigns/preview/jobs/:id` polling endpoint.
The runner is read-only: no clicks, no submits, no events fired.
A self-contained JS-rendered fixture is available at
`/api/campaigns/preview/fixture/js-rendered` for proof tests — static
HTTP extraction sees just the loading placeholder, real-browser
preview sees the hydrated control inventory.

**Stage D — authenticated preview shipped**: the same runner now
accepts a Login Memory profile reference and inspects authenticated
routes. **Caller passes ONLY the safe profile reference, never raw
secrets:**

- Request body: `authScope: "authenticated_preview"` +
  `authProfileId: "login_<22>"` (or `authProfileName` for back-compat).
- The Worker rejects any body containing raw `cookies`, `cookieJar`,
  `cookieHeader`, `storageState`, `localStorage`, `sessionStorage`,
  `password`, `token`, `accessToken`, `idToken`, `refreshToken`,
  `apiKey`, `authorization`, `bearer`, `auth`, `credentials`,
  `sessionCookie` — closed-set forbidden field list.
- The Worker validates ownership: `login_memory_sessions.email` must
  match the X-Api-Key caller's email (admin keys bypass).
- The Hetzner runner is the ONLY consumer that decrypts the vault
  blob (via existing `/api/internal/auth-sessions/:id/decrypt-blob`
  with X-Internal-Secret).
- Per-route response carries closed-set `authDiscoveryStatus` ∈
  `not_requested` · `authenticated` · `redirected_to_login` ·
  `unauthorized` · `forbidden` · `profile_missing` · `profile_invalid` ·
  `profile_expired` · `profile_forbidden` · `unknown`.
- `discoveryMethod` flips to `real_browser_preview_authenticated`
  ONLY for routes inspected with auth + hydrated.
- Stage D invariants HARD: `chargedRunCredit:0`, no form submission,
  no destructive/payment/email controls clicked. The only invariant
  Stage D may flip to `true` is `useAuthProfile` — and only when
  ownership is verified.

**Capability flags** (every preview response):

For Stage B (HTTP/static):

- `capabilities.discoveryEngine` = `"cloudflare_worker_http_fetch_static_html"`
- `capabilities.trueBrowserDiscoveryAvailable` = `false`
- `capabilities.jsRenderedAppRisk` = `true`
- `capabilities.stageCRequiredForRealBrowserDiscovery` = `true`

For Stage C (real-browser preview, after the Hetzner runner completes):

- `capabilities.discoveryEngine` = `"testorax_runner_playwright_preview"`
- `capabilities.trueBrowserDiscoveryAvailable` = `true`
- `capabilities.jsRenderedAppRisk` = `false` (only when ALL selected
  routes were browser-inspected; otherwise `true` as a worst-case)
- `capabilities.stageCRequiredForRealBrowserDiscovery` = `false`
- `capabilities.previewJobId` = `prv_<22 chars>`
- `capabilities.previewJobStatus` ∈ `queued` · `running` · `complete` ·
  `failed` · `capped` · `feature_disabled`

Always:

- `capabilities.authenticatedDiscoveryAvailable` = `false` (until Stage D)
- `capabilities.chromeParityAvailable` = `false` (HARD until Chrome Live ships)
- `capabilities.realBrowserFallbackReason` (when set) ∈ closed set:
  `feature_flag_disabled` · `capacity_exhausted` · `timeout_exceeded` ·
  `auth_required_no_profile` · `cap_exceeded` · `safety_violation` ·
  `engine_error`

**Stage F — Chrome parity + runner-mismatch classification (shipped)**:
preview jobs now carry a parity surface. After the runner completes,
the worker compares the static control inventory captured at queue time
against the runner's hydrated control inventory and emits per-route +
job-level parity blocks.

- `GET /api/campaigns/preview/jobs/:id` returns `jobParity` +
  `routeParities[]`.
- `GET /api/campaigns/preview/jobs/:id/progress` returns a compact
  `parityHint` once the job completes.
- MCP `campaign_parity` tool returns the parity-only projection.
- CLI `testorax campaign job <id>` renders the parity panel.

`parityStatus` closed-set: `not_checked` · `internal_compared` ·
`chrome_checked` · `mismatch_detected` · `insufficient_evidence`.
**`chrome_checked` is NEVER emitted in Stage F** — Chrome Live is not
wired in this environment.

`mismatchClassification` closed-set (12): `none_detected` ·
`static_vs_browser_mismatch` · `runner_vs_chrome_mismatch` ·
`selector_mismatch` · `route_mismatch` · `hydration_mismatch` ·
`auth_state_mismatch` · `viewport_mismatch` ·
`click_actionability_mismatch` · `network_environment_mismatch` ·
`chrome_confirmation_needed` · `insufficient_evidence`.

**Agent rules**: a `mismatch_detected` is NOT automatically an app bug.
Read `mismatchClassification` + `runnerMismatchPossible` before patching
production code. When `chromeConfirmationRequired: true`, do not patch
without confirming the route in real Chrome first. When
`mismatchClassification === 'static_vs_browser_mismatch'`, use the
real-browser depth — the static preview undercounted a JS app; this is
NOT an app fix. Never claim Chrome parity.

**Stage G.1 — Testorax Chrome Proof Bridge extension MVP (shipped)**:
the companion Chrome extension lives at `apps/chrome-proof-bridge/` and
uses the same evidence contract as the Stage G.0 bridge with
`source='chrome_extension'`. Manifest V3, minimal permissions
(`activeTab` + `scripting` + `storage`), host permission limited to
`https://testorax.com/*`. The extension never reads cookies,
localStorage, sessionStorage, Authorization/Bearer headers, passwords
(counted only — `.value` is never read), tokens, request bodies, or
response bodies. Capture happens only on user click — no background
scraping, no auto-capture, no telemetry. The Stage G.0 MCP/CLI bridge
is preserved unchanged for coding-agent / CI flows.

**Stage G.0 — Chrome Live MCP Bridge (shipped)**: real Chrome evidence
can now be captured and compared. Local stdio MCP tool
`chrome_capture_for_job` (and CLI `testorax campaign chrome-capture
<jobId> --target-url <url>`) spawns the operator's installed Google
Chrome stable via raw CDP, opens the target URL, captures a redacted
DOM control inventory, and POSTs it to
`POST /api/campaigns/preview/jobs/:id/chrome-evidence`. The Worker then
emits `parityStatus: 'chrome_checked'` for that route. `chromeParityAvailable`
flips to `true` job-wide.

The hosted MCP deliberately does NOT register `chrome_capture_for_job`
because it cannot drive a real Chrome on the operator's machine. The
ingestion endpoint rejects every forbidden field by NAME — `cookies`,
`cookieHeader`, `storageState`, `localStorage`, `sessionStorage`,
`authorization`, `bearer`, `password`, `accessToken`, `refreshToken`,
`idToken`, `apiKey`, `secret`, `csrfToken`, `requestBodies`,
`responseBodies` — before validating values. The bridge never reads
cookies, localStorage, sessionStorage, Authorization headers, passwords,
or request/response bodies; the CDP evaluate script walks visible DOM
controls only.

These flags are honest about what the engine did NOT do. Authenticated
discovery + Chrome parity execution + sandbox mutation flow ship in
Stage D.

**Campaign scope** (Stage B): caller may narrow which routes count
toward the page-run cost via `scope` + `module` + `routes`. Closed-set
scope (4): `all` · `module` · `route_group` · `route_only`. Module ids
(21, closed): `marketing` · `auth` · `dashboard` · `admin` · `pos` ·
`kds` · `dispatch` · `riders` · `orders` · `customers` · `inventory` ·
`products` · `finance` · `billing` · `wallet` · `apps` · `settings` ·
`storefront` · `booking` · `docs` · `unknown`. Response includes
`moduleOptions[]` with per-module page-run + run-credit estimates and
`appliedScope` echo.

**Per-route control summary** (Stage B): `controlInventory.byRouteSummary[]`
returns one `RouteControlSummary` per discovered route with typed counts
(`buttons`, `links`, `dropdowns`, `dropdownOptions`, `tabs`, `forms`,
`inputs`, `searchControls`, `filters`, `sorters`, `tables`,
`paginationControls`, `modalsOrDrawers`, `fileUploads`,
`destructiveActions`, `paymentActions`, `emailSendActions`,
`unknownControls`) plus `discoveryMethod`, `routeStatus`, `parityStatus`
(`not_checked`), `parityReason`.

**Parity placeholder** (Stage B): `parity.runnerMismatchPossible`,
`parity.chromeConfirmationRecommended`, `parity.parityStatus` (closed
set: `not_checked` · `not_required` · `recommended`), `parity.parityReason`.
Stage B NEVER claims Chrome parity was checked.

**Agent rules**:

- Do NOT trigger any run from a Stage A preview. Always show the customer
  `estimatedCredits` and `estimatedCostLabel` first.
- Treat `destructive_blocked`, `payment_blocked`, `email_send_blocked`
  controls as never-clickable in Stage A.
- Treat `unknown_needs_review` as out-of-scope until classified.
- Preserve the `riskWarnings` and `willNotBeTested` arrays in any
  customer-facing summary.

## Assertion helpers (DSL)

Compose tests from a closed set of **19 helper kinds**: `page_should_load`,
`text_should_appear`, `text_should_not_appear`, `button_should_be_clickable`,
`form_should_submit`, `url_should_include`, `element_should_exist`,
`element_should_not_exist`, `console_should_be_clean`, `network_should_be_clean`,
`no_4xx_5xx_requests`, `no_visible_error`, `toast_should_appear`,
`table_should_have_rows`, `modal_should_open`, `modal_should_close`,
`input_should_accept_text`, `select_should_change`, `checkbox_should_toggle`.

Each compiles to existing engine `TestStep` actions only — no new
`StepAction` values, no executor change. Validate one-liner pattern:

`POST https://testorax.com/api/scenario-templates/validate-assertion`
Body: `{ "helper": { "kind": "<one of 19>", ...fields } }`.
On 400 the response includes `code`, `field?`, `message`, and an
`exampleValid` showing the correct shape.

Refusal codes (HTTP 400): `unknown_helper_kind` / `missing_required_field` /
`invalid_field_shape` / `forbidden_value_shape` / `too_many_helpers`.
The validator rejects literal cookie / Bearer / JWT / vendor secret key /
password patterns. 50-helper-per-scenario hard cap.

Companion: 12 new scenario templates added to
`/api/scenario-templates` across 6 new categories (`dashboard`, `admin`,
`proof`, `fix_check`, `booking`, `console_clean`). Total registry: 57
templates. The generator refuses 0-step output as
`invalid_template_generation`.

Full walkthrough at https://testorax.com/docs/scenario-templates

### MCP + CLI helper

Use these BEFORE submitting a custom-scenario run to catch shape errors
locally — no API key, no run created, no quota burn.

**MCP tool** (both stdio + hosted): `validate_assertion`. Args
`{ kind, fields }`. Returns `{ ok, contractVersion, compiledSteps }` on
success; `{ ok:false, code, field?, message, exampleValid }` on failure.
Same DSL surface as the REST endpoint — single source of truth in
`packages/shared/src/assertionDsl.ts`.

**CLI command**: `testorax assertion validate <kind> [--field key=value …]`,
or `testorax assertion validate --json '{...}'`, or
`testorax assertion validate --file helper.json`, or
`echo '{...}' | testorax assertion validate --stdin`. Aliases: `compile`.
Exit codes: 0 OK, 2 usage, 3 invalid helper (with corrected example
printed). NO API key required.

**Copy-paste examples for coding agents:**

Valid helper:
```bash
testorax assertion validate text_should_appear --field text="Welcome back"
```

Invalid helper (returns corrected example):
```bash
testorax assertion validate text_should_appear
# → ✗ Invalid helper (missing_required_field)
#   field: text
#   Corrected example: { "kind": "text_should_appear", "text": "Welcome back" }
```

Console / network clean check:
```bash
testorax assertion validate console_should_be_clean --field allowWarnings=false
testorax assertion validate no_4xx_5xx_requests
```


## Campaign Fix Check loop (Stage J)

After a Stage I campaign produces issues + AI Fix Prompts, run a Fix Check
to verify the fix actually landed on the deployed Hetzner runner. Reuses
the Stage H campaign-page-run runner; no new infrastructure.

| Endpoint | Purpose |
|----------|---------|
| `POST /api/campaigns/:id/fix-check` | Start a Fix Check (1 run credit per page-run; admin bypass). Idempotent on (sourceCampaignId, scope, sourceXxxId). |
| `GET /api/fix-checks/:id` | Full status + verdict + before/after comparison. |
| `GET /api/fix-checks/:id/progress` | Compact progress; isFinal=true when verdict is stable. |

**13-verdict closed set:** `verified_fixed`, `still_failing`, `partially_fixed`,
`regression_detected`, `inconclusive`, `blocked_auth`, `blocked_safety`,
`runner_mismatch_possible`, `chrome_confirmation_needed`,
`chrome_checked_match`, `chrome_checked_mismatch`,
`insufficient_evidence`, `not_rerun_yet`.

**5-scope closed set:** `issue`, `page_run`, `patch_batch`,
`failed_routes`, `full_campaign`.

**MCP tools:**
- `fix_check_start` — start a Fix Check.
- `fix_check_status` — full status + verdict + comparison.
- `fix_check_progress` — compact progress snapshot.
- `fix_check_verdict` — trimmed verdict-only view.

**CLI:**
- `testorax campaign fix-check <campaignId> --issue <id>|--page-run <id>|--patch-batch <id>|--failed-routes|--full-campaign`
- `testorax campaign-fix-check status <fixCheckId>`
- `testorax campaign-fix-check watch <fixCheckId> --json`

**Honesty rules (do not bypass):**
- Do not claim the issue is fixed unless `verdict=verified_fixed`.
- Do not bypass the UI by editing data, calling internal endpoints, or hitting the database.
- Do not remove auth gates, billing gates, run-credit deductions, or capacity protections.
- Do not weaken assertions, delete the failing scenario, or change expected status to make the test pass.

Full walkthrough: https://testorax.com/docs/fix-check


## Marketing + discoverability pages (2026-05-07)

Public, agent-readable pages for explaining Testorax positioning, trust, integrations, and proof contract. All return 200 to bots without auth.

**Trust + how-it-works:**
- https://testorax.com/is-testorax-legit
- https://testorax.com/how-testorax-works
- https://testorax.com/no-code-access-needed
- https://testorax.com/safe-browser-testing

**Proof contract surfaces:**
- https://testorax.com/proof-packet — closed-set verdicts + doNotClaim guards
- https://testorax.com/ai-fix-prompt — 7 closed branches; standard_fail / test_design / provider_environment / low_trust / inconclusive / coverage_partial / pass
- https://testorax.com/fix-check — 13 closed verdicts; only `verified_fixed` counts

**Agent integrations (positioning, not docs):**
- https://testorax.com/testorax-for-claude-code
- https://testorax.com/testorax-for-cursor
- https://testorax.com/testorax-for-codex
- https://testorax.com/mcp-testing-for-ai-agents — full MCP tool catalog (~70 tools)

**Compare:**
- https://testorax.com/alternatives
- https://testorax.com/testorax-vs-testsprite
- https://testorax.com/testorax-vs-magicpod
- https://testorax.com/testorax-vs-bugbug
- https://testorax.com/testorax-vs-reflect

**Future-track docs:**
- https://testorax.com/docs/mobile-native-qa — Web/PWA/SaaS supported today; native APK/IPA on roadmap, not shipped.

## Refund tooling (Stage P.2 Half 2 — 2026-05-07)

Customer-facing endpoints (X-Api-Key required):
- `POST /api/billing/refunds/request` — body `{paymentId, reason, message?}`. Auto-approves under-$30 unused-purchase refunds; sends >=$30 to manual-review queue. Idempotent on `(paymentId, accountEmail)`. Never double-calls Dodo.
- `GET /api/billing/refunds/:refundId` — single-record status + decision reason.
- `GET /api/account/refunds?email=<email>` — refund history for an account.

Closed status set: `requested | auto_approved | manual_review_required | refund_submitted | refunded | rejected | failed | provider_pending | dispute_open | chargeback_open | duplicate_request | idempotent_replay`.

Closed reason set: `duplicate_payment | technical_failure_no_report | report_inaccessible | report_empty_or_broken | user_unhappy | accidental_purchase | other`.

Agent rules:
- Do not claim a refund completed unless `status === 'refunded'`.
- Do not retry a refund on a duplicate_request response — it means a prior identical request is already in the system.
- Do not retry on a manual_review_required — it just queues for human review.
- Do not surface card numbers / CVV / expiry / any PAN-shaped digit run anywhere; the refund record carries 14-pattern defense-in-depth redaction.

Admin endpoints (X-Admin-Key gated):
- `GET /api/admin/refunds/queue?status=manual_review_required` — review backlog.
- `POST /api/admin/refunds/:refundId/decide` — body `{decision: 'approve'|'reject', note?: string}`.

Customer dashboard: refunds appear under Account → Wallet → Refunds tab.

## Run-level failure intelligence (Round-2 Ordegate-feedback fix, 2026-05-07)

For Fast Bug Scan runs (mode=fast_bug_scan), the same Stage I/J shape that
campaigns get is now exposed at run-level via these endpoints:

- `GET /api/runs/<runId>/issues` — promoted findings + step failures
- `GET /api/runs/<runId>/proof-packets` — per-issue proof packet stubs (full AI Fix Prompt is in /proof-pack.json)
- `GET /api/runs/<runId>/ai-fix-prompts` — per-issue fix prompt stubs
- `GET /api/runs/<runId>/patch-batches` — grouped by route+kind, `likelyFiles: []` (never hallucinated at run-level)
- `GET /api/runs/<runId>/baseline` — run baseline + intel classifications + risks
- `GET /api/runs/<runId>/rerun-plan` — recommended next actions, `doNotAutoExecute: true`
- `GET /api/runs/<runId>/failure-intelligence` — composite of all the above
- `GET /api/runs/<runId>/events` — SSE stream (2s tick, 60s wall cap, closes on completed/failed)

Auth: same as `/api/runs/:id/intelligence` (X-Admin-Key OR session cookie OR X-Api-Key with `proof:read` scope; non-admin keys cross-tenant guarded).

### New runner-limitation classifications (NOT app bugs)

Click classifier now distinguishes runner-side failures from real app bugs:

- **`protocol_handler`** — clicks on `tel:`, `mailto:`, `sms:`, `facetime:`, `data:`, `callto:`, `skype:`, `whatsapp:`, `viber:` always abort in headless Chromium because the OS app can't open. severity=info. NEVER promoted to a finding.
- **`external_link_navigation`** — clicks where target host !== run host AND an abort signal fires. severity=info. NEVER promoted to a finding.
- **3rd-party tracker request aborts** — google-analytics, facebook, hotjar, mixpanel, segment, amplitude, linkedin, fullstory, heap, intercom, drift, bing, yandex, clarity, datadog, sentry, newrelic, doubleclick, googlesyndication aborts are filtered OUT of the failure-count BEFORE classification, so they don't drive `network_error` or `http_error` upgrades.

Agent rule: **do not patch app code based on protocol_handler / external_link_navigation / tracker-failure findings**. They are runner limitations only.

## Round-2 Ordegate-audit fixes (2026-05-07)

Four contract gaps from the round-2 audit have been patched:

### 1. Run-level Fix Check idempotency (Attack 6 — was P0 billing risk)

`POST /api/runs/<id>/fix-check` now dedupes across THREE windows:
- **In-flight**: any verify run for the same `priorRunId+email` in `queued|running|pending_payment` → returned with `idempotent:true, creditCost:0`
- **Recently-completed (30 min)**: any verify run for the same `priorRunId+email` completed in the last 30 minutes → same dedup
- **Explicit idempotencyKey (24h)**: optional `body.idempotencyKey` (1-128 chars, [A-Za-z0-9_-:.] only) extends dedup to 24h

Body schema is now strict — closed allowed-fields set: `{idempotencyKey?}`. Unknown fields → `400 unknown_fields` (was: silently dropped).

Response always includes `sameTestLockEnforced: true` and (when supplied) the `idempotencyKey` echoed back.

Reasons returned: `verify_run_already_in_flight | recent_verify_run_within_dedup_window | idempotency_key_matched`.

Agent rule: **firing 5 Fix Check requests in quick succession will NOT bill 5 credits.** The first call mints + charges; the next 4 dedup to the same `verifyRunId` with `creditCost:0`.

### 2. Stage B preview → stateful campaign flow discovery (Attack 7 — was P1 contract gap)

`POST /api/campaigns/preview` is intentionally stateless and does NOT mint a `previewJobId`. The response now includes `nextStepHint.statefulCampaignFlow` describing the 4-step executable chain:

```
1. POST /api/campaigns/preview/jobs           → mints prv_<22>
2. GET  /api/campaigns/preview/jobs/<jobId>   → poll until completed
3. POST /api/campaigns/preview/jobs/<jobId>/confirm → mints cmp_<22>
4. GET  /api/campaigns/<campaignId> + /events  → live progress
```

Agent rule: **the stateless Stage B preview is for inventory discovery only. For executable campaigns, follow the 4-step chain.**

### 3. login_memory + session_injection now in canonical validator (was P2 docs gap)

`packages/shared/src/auth.ts` `KNOWN_STRATEGIES` was missing `login_memory` and `session_injection` even though they were in `runSubmissionSchema.ts` AND `/docs/login-memory`. Validator now accepts both:
- `auth.strategy: 'login_memory'` requires `authSessionId: 'login_<22 chars>'`. Forbids combining with raw cookies/storageState/password/loginForm fields.
- `auth.strategy: 'session_injection'` requires `cookies[]` AND explicit `allowSessionInjection: true`.

New error codes: `auth_login_memory_required`, `auth_session_injection_unsafe`, `auth_mixed_strategies`.

### 4. Fix Check assertionsHash visibility (Attack 2 — was P1 visibility gap)

`POST /api/runs/<id>/fix-check` response now includes `sameTestLockEnforced: true` so agents know the same-test lock will fire on the verdict side BEFORE waiting 30s. The verdict `GET /api/runs/<verifyRunId>/fix-check/result` continues to return `sameTestLock.passed` and `originalFingerprint.assertionsHash` vs `verificationFingerprint.assertionsHash` so agents can verify divergence client-side.

## Round-3 follow-up: post_nav_auth_401 classifier (2026-05-07)

After the Ordegate Round-3 audit confirmed the Round-2 classifier patches landed cleanly (12 false-positive HIGH eliminated on the same input), 4 HIGH issues survived. On inspection, 3 of them were the same shape:

> Click on a marketing CTA → page navigates correctly → resulting page renders + makes 1+ auth-gated API call → API returns 401/403 because the user is anonymous → previously classified as `http_error` HIGH

**This is not a click failure.** The click worked; the 401 is the correct auth boundary.

**NEW classification: `post_nav_auth_401`**
- Fires when: nonTrackerHttpErrors are ALL 401/403, click succeeded (nav landed OR DOM updated), AND no other failure signal present
- **Severity = MEDIUM** (down from HIGH http_error)
- Status = warning (not failed)
- Promoted to a finding so agents see it, but doesn't inflate HIGH counts

The signal is real but separate from the click: the marketing page may not need to call those auth-gated endpoints anonymously. Triage as a "wasteful pre-login API call" finding, not a click failure.

Round-4 result on the same `ordegate.com` input: HIGH count went from **4 → 1**. The only surviving HIGH is the genuine `unreachable_critical_control "Send Message"` finding. 

Cumulative classifier evolution (same input each time):
- Round 1: 16 HIGH (pre-fixes)
- Round 3: 4 HIGH (after external_link_nav + protocol_handler + tracker filtering)
- Round 4: 1 HIGH (after post_nav_auth_401)

**94% HIGH-severity false-positive reduction across 4 rounds, same input, same agent.**

## Round-4 vulnerability-report fixes (2026-05-07)

The agent's deep stress audit (14 attack categories, 30+ live runIds) caught one **P0** and several P1/P3 issues. All fixed:

### P0 — webhookUrl SSRF guard ✅ FIXED

`POST /api/runs/start` and `/api/runs/bypass` and `/api/runs/live-click-audit` now run `body.webhookUrl` through the same `checkUrlSafety` validator as `body.url`. Loopback (127.0.0.1, ::1), RFC1918 (10/8, 172.16/12, 192.168/16), link-local (169.254/16), IPv6 ULA + private, file://, gopher://, javascript:, data:, plus URLs > 2048 chars all return `400` with `field: 'webhookUrl'`. **No run created. No credit deducted.**

Live verified: 5/5 dangerous URLs → 400. Safe URL → 200.

### P1 — SSE per-scenario progress ✅ FIXED

`GET /api/runs/<runId>/events` now emits `progress` events whenever ANY observable field changes (`status`, `currentStep`, `testsTotal`, `testsPassed`, `testsFailed`) — not just status transitions. Heartbeats only fire when nothing changed.

Live verified on a fresh run: 4 progress events captured (initial state → mid-run with `currentStep:"live_click_audit (1 clicks)"` and `testsPassed:1` → completion → final completed envelope) interleaved with 2 heartbeats. Per-scenario progress fires correctly during the `running` phase.

### P1 — Scenario-lint XSS / CRLF / unknown-action / oversize blocks ✅ FIXED

NEW lint codes (always-block, regardless of `policy.strictAssertions`):
- `scenario_name_unsafe_html` — title contains `<script>` / `<iframe>` / `<object>` / `<embed>` / `<svg>` or `javascript:` protocol
- `scenario_name_contains_crlf` — title contains CR/LF
- `scenario_name_too_long` — title > 200 chars
- `unknown_step_action` — step uses an action not in the canonical set
- `selector_too_long` — selector > 1000 chars
- `step_url_contains_crlf` — step URL contains CR/LF (header-injection vector)
- `step_url_unsafe_protocol` — step URL uses javascript:
- `step_url_too_long` — step URL > 2048 chars
- `step_value_too_long` — step value > 4000 chars

Wired into BOTH `/api/runs/start` (already had the lint) AND `/api/runs/bypass` (was missing the lint). Returns `400 scenario_lint_blocked` with `blockers[]` listing the codes. **No run created. No credit deducted.**

### P3 — Unknown /api/* paths return structured 404 ✅ FIXED

Previously fell through to SPA HTML shell. Now returns `404 endpoint_not_found` JSON with the actual path + advice pointing at /agents.md. Agents can branch on `code: 'endpoint_not_found'`.

### P3 — body.idempotencyKey now honored alongside Idempotency-Key header ✅ FIXED

`POST /api/runs/start` accepts the idempotency key from EITHER:
- `Idempotency-Key` HTTP header (preferred — RFC 9457 alignment)
- `body.idempotencyKey` field (fallback for clients that can't set headers)

Header wins when both supplied. Same dedup semantics either way.

### Honest non-fixes (correct behavior verified)

- **Anonymous /api/campaigns/preview** — 200 with `runBalance: null` is correct. Public preview inventory is a free-audit feature.
- **Rate limit 30/min on /api/campaigns/preview** — already in place; 20-concurrent burst < 30/min cap = correct behavior.
- **Idempotency-Key header (verified working in Round 4 retest)** — 4/4 calls with same key returned same `verifyRunId` with `idempotent: true, creditCost: 0`. Round-2 conclusion that idempotency was broken was wrong; the body field was the issue.

## Round-5 verification + customer-data-plane clean (2026-05-07)

After Round-4 vulnerability report shipped 6 fixes, an external agent re-verified all of them PLUS pushed into customer-data-plane attacks (cross-tenant Login Memory / refund / run-intel / queue / cancel / stored-XSS / hang).

**Result: 13/13 PROOF HOLDS** (1 of those is PARTIAL EVIDENCE — single-account observation gap, not a product gap).

### Two clarifications for future audits

1. **SSE error envelope is in-band, not HTTP-status-coded.** When you `GET /api/runs/<unknown-id>/events`, the response is `200` with body:
   ```
   event: failed
   data: {"runId":"<id>","code":"run_not_found"}
   ```
   followed by stream close. This is the correct pattern for SSE — once headers are flushed, the only way to deliver an error is in-band. **Don't flag this as a 404 regression.** Other endpoints that don't have an open stream return 404 normally.

2. **Proof-packet `reportContractVersion` field.** The proof-pack JSON carries `reportContractVersion` (currently `"1.9.0"`), NOT a top-level `contractVersion` field. The compact-proof endpoint carries `compactProofVersion`. The intelligence endpoint carries `notes.contractVersion`. Each has a distinct shape — don't assume one field name across surfaces.

### Honest non-features confirmed by the audit

- **`/api/runs/<id>/cancel` does not exist.** Cancel is body-driven via `POST /api/runs/cancel` with `{runId}` in body. The path-driven variant returns `404 endpoint_not_found`. This is the canonical shape — there is no plan to add the path-driven variant.
- **`/api/runs/start` with mode=`fast_bug_scan` silently drops `customScenarios[]`.** Custom scenarios for fast bug scan are honored only via `/api/runs/bypass`. The runner does its own discovery for `/api/runs/start`. This is intentional and is part of why the Round-4 stored-XSS payload couldn't reach the renderer.
- **Hang-shaped scenarios (e.g. `wait 600000`) converge to `authoritativeOutcome: 'inconclusive'` in well under the wall-clock cap.** The proof contract refuses to make a confident claim from thin evidence — `aiFixPrompt: null`, `doNotClaim` populated with explicit guards, `trustScore: 20`. This is the proof contract working as designed under adversarial input.

### Cumulative product safety progression across 5 rounds

| Round | Focus | Result |
|---|---|---|
| 1 | Initial autonomous flow | 19 issues, 16 false-positive HIGH |
| 2 | Proof-contract attacks | 1 P0 idempotency reported (later corrected — header works, body field didn't) |
| 3 | Re-run after classifier patches | 7 issues, 4 HIGH (false positives -75%) |
| 4 stress | 14 attack categories, 30+ runIds | 1 P0 SSRF + 3 P1 + 4 P3 found and fixed |
| 5 verify | Re-verify R4 + customer-data plane | 13/13 PROOF HOLDS, 0 new vulns, agent confirms launch readiness |

**94% HIGH-severity false-positive reduction (16 → 1) cumulative.** The single surviving HIGH on `ordegate.com` was a genuine `unreachable_critical_control` finding to be Chrome-verified.

The autonomous-flow API is launch-ready from the agent-integration surface. Remaining work is Stage C/D real-browser + auth-bound discovery so SPA dashboards stop appearing as single-route shells.

## Round-6 P0 race fix — concurrent Idempotency-Key dedup (2026-05-07)

The Round-6 stress audit (`B1` probe) found a P0 race condition: 10 concurrent `POST /api/runs/start` requests with the SAME `Idempotency-Key` minted 10 distinct billable runIds. The dedup was sequentially-safe (Round-5 verified single-call replays work) but had a 100ms-2s race window on concurrent requests.

### Root cause

The previous design did:
1. SELECT idempotency_keys (early)
2. … 100ms-2s of validation, billing, run insertion …
3. INSERT OR IGNORE idempotency_keys (late)

10 concurrent requests all SELECT-missed at step 1 (no row yet), all proceeded through step 2 (each minting their own runId, charging their own credit, inserting their own run row), all reached step 3 — UNIQUE constraint silently IGNORE'd 9 of the 10 inserts but 9 distinct runs already existed in the `runs` table.

### Fix

The idempotency_keys INSERT now lives INSIDE the same atomic D1 batch as the run / wallet-charge / regression-targets inserts. The INSERT uses plain `INSERT INTO idempotency_keys` (no OR IGNORE) so a UNIQUE-constraint violation on `(key, email, endpoint)` ABORTS the entire batch — meaning the racer's run row, wallet debit, and other ops all roll back atomically. The handler catches the constraint violation, re-SELECTs to find the racer's runId, and returns it as a replay.

Same fix applied to `/api/runs/bypass` (slot reserved up-front before the run-insert pipeline; on concurrent loss, racer's runId returned).

### Verified live

```
10 × POST /api/runs/start with same Idempotency-Key:
  REQ 0 → runId=ipHqHK5rv7Q-DA3QCyg9d, idempotentReplay:false  (the winner)
  REQ 1-9 → runId=ipHqHK5rv7Q-DA3QCyg9d, idempotentReplay:true  (race-resolved replays)
  uniqueRunIds: 1 / 10
```

Sequential dedup still works:
```
3 × POST /api/runs/start with same Idempotency-Key:
  call 1 → runId=VleDz61SmmJRGysWq3rnj, idempotentReplay:false
  call 2 → runId=VleDz61SmmJRGysWq3rnj, idempotentReplay:true
  call 3 → runId=VleDz61SmmJRGysWq3rnj, idempotentReplay:true
```

Different body + same key still 409:
```
POST with differing body → 409 idempotency_conflict + existingRunId + idempotencyKeyHashPrefix
```

### Why this matters for billing

A buggy SDK retry (5s timeout, retry-on-network-error) firing 3-5 concurrent identical requests would have been billed 3-5× before this fix. Production-grade clients hitting at-least-once-delivery semantics are now safe to use Idempotency-Key for retries.

### Cumulative round progression

| Round | Surface | Result |
|---|---|---|
| 1 | Initial autonomous flow | 19 issues, 16 false-positive HIGH |
| 2 | Proof-contract attacks | 0 found (later corrected) |
| 3 | Re-run after classifier | 7 issues, 4 HIGH (FP -75%) |
| 4 | 14 attack categories stress | 1 P0 SSRF + 3 P1 + 4 P3 found+fixed |
| 5 | Re-verify R4 + customer-data | 13/13 PROOF HOLDS |
| 6 | Billing + concurrency | **1 P0 race found+fixed**, 0 customer-data plane issues |

The P0 race is now closed. The agent's recommendation to apply the same atomic-INSERT-OR-IGNORE pattern from `/api/runs/<id>/fix-check` to `/api/runs/start` is exactly what was shipped (slightly evolved — same atomic-batch approach but with explicit UNIQUE-constraint-violation detection so the racer's runId is returned, not just silently dropped).

## Stage C/D LIVE — real-browser + auth-bound discovery (2026-05-07)

After 5 rounds of external audit calling out Stage C/D as the largest coverage gap, the feature flag `STAGE_C_BROWSER_PREVIEW_ENABLED` is now flipped to `true` in production. The code was complete since the May 2026 milestone — only the env var was holding it back.

### What's now live

**Stage C — real-browser preview**:
- `POST /api/campaigns/preview` with `discoveryDepth: 'real_browser_preview'` mints a `prv_<22>` job ID
- Hetzner Playwright runner picks it up, navigates each route in real Chromium, waits for hydration, walks the visible DOM
- Result delivered via `GET /api/campaigns/preview/jobs/<previewJobId>` polling
- `discoveryEngine` flips from `cloudflare_worker_http_fetch_static_html` (Stage B) to `testorax_runner_playwright_preview` (Stage C)

**Stage D — authenticated discovery**:
- Pass `authSessionId: 'login_<22>'` (canonical) OR `authProfileId` (legacy alias) in the preview body
- Worker validates ownership + status + expiry + email match (cross-tenant guarded)
- Hetzner runner fetches the encrypted vault blob, decrypts with KEK, injects `storageState` into Playwright BrowserContext
- Post-login DOM walk surfaces authenticated-only controls
- Response carries `capabilities.authenticatedDiscoveryAvailable: true` + `routesInspected[].authDiscoveryStatus: 'authenticated'`

### Live verification (2026-05-07)

**Stage C end-to-end** on JS-rendered fixture:
```
POST /api/campaigns/preview {discoveryDepth: 'real_browser_preview', targetUrl: '...js-rendered'}
  → previewJobId: prv_9uV0OpxIv4c2WiQxln-TgL
  → polled, status=complete in <8s
  → totalControls: 13 (Stage B saw 0)
  → discoveryMethod: real_browser_preview
  → discoveryEngine: testorax_runner_playwright_preview
  → invariantsHonored: all 9 safety flags = false (consumeQuota / submitForms / clickDestructive / etc)
```

**Stage D end-to-end** on auth-rendered fixture:
```
POST /api/internal/test-auth-profile/seed (X-Internal-Secret-gated) → seeds login_<22>
POST /api/campaigns/preview {authSessionId: 'login_<22>', discoveryDepth: 'real_browser_preview'}
  → previewJobId: prv_fBci6T2YHOrkAAoBi1OJ6z
  → polled, status=complete
  → totalControls: 10 (anonymous Stage C saw 2 — auth surfaced 8 additional dashboard controls)
  → authDiscoveryStatus: authenticated
  → useAuthProfile: true
  → authenticatedDiscoveryAvailable: true
```

### Stage C invariants (HARD — runner enforces)

- `consumeQuota: false`
- `chargedRunCredit: 0`
- `submitForms: false`
- `clickDestructive: false`
- `clickPayment: false`
- `sendEmail: false`
- `uploadFiles: false`
- `expandDropdowns: false`
- `openModals: false`

Stage C is read-only. It walks the DOM, extracts controls, classifies them. It NEVER clicks, fills, submits, or mutates anything. The preview is exactly that — a preview. Confirmation + execution is a separate `campaign_execute` step.

### Caps (hard-enforced by Hetzner runner)

- 3 routes per job by default, 10 hard cap
- 20s per-route timeout
- 60s total wall-clock cap
- 5 screenshots max
- 256 KB output per route

Beyond caps, routes are recorded as `routesNotInspected` with `cap_exceeded` reason. NEVER silently truncated.

### Closed-set fallback reasons (when Stage C declines to run)

`feature_flag_disabled | capacity_exhausted | timeout_exceeded | auth_required_no_profile | cap_exceeded | safety_violation | engine_error`

Each surfaces with a structured `realBrowserFallbackReason` field. Static HTML preview is NEVER silently presented as browser preview.

### Next steps (operator + agent)

1. **Update `/docs/campaign-preview` page** to mark Stage C/D as LIVE (currently still says "Stage C ships in" placeholder language)
2. **Tell external testing agent**: re-run their Round 1 SPA-coverage probe against ordegate.com `dashboard-sandbox` / `pos-sandbox` / `admin-sandbox` — these previously showed 0-3 controls in Stage B. Stage C should now surface real hydrated controls.
3. **For authenticated Ordegate SPA testing**: customer creates a Login Memory entry via `/account/login-memory` for the merchant test account, captures `authSessionId`, agent passes it in preview body. Stage D auth surfaces dashboard controls only logged-in users see.

## Round-7 push toward 9.5/10 — three improvements + one P0 caught (2026-05-07)

After the agent's 8.5/10 review flagged 5 specific deductions, three of them were already addressed (Stage C/D LIVE, contract unification shipped, race fix verified). Two new items shipped to push toward 9.5+:

### Improvement #1 — Full AI Fix Prompt body inline at /api/runs/<id>/ai-fix-prompts

Previously the endpoint returned per-issue stubs with `hint: "Use /api/runs/<runId>/proof-pack.json"`. The agent flagged this as an extra fetch they shouldn't need.

Now the response carries:
- `aiFixPrompt` — full body inline (≤8 KB per the existing aiFixPrompt.ts hard cap)
- `aiFixPromptBranch` — closed-set: `standard_fail | test_design | provider_environment | low_trust | inconclusive | coverage_partial | pass | no_issue`
- `aiFixPromptLowTrustReason` — populated when prompt is null (explains why)
- `proofPacketContractVersion` — for contract-version negotiation
- `notes.fullPromptInlined` — boolean signal so callers can branch
- `fixPrompts[]` — legacy per-issue stubs preserved for back-compat

### Improvement #2 — Fixture API key seeder for cross-tenant proof testing

NEW endpoint `POST /api/internal/test-api-key/seed` (X-Internal-Secret-gated). The agent's #5 deduction asked for a way to mint a 2nd test API key so cross-tenant isolation could be PROVEN rather than INFERRED from oracle absence.

Mirrors the test-auth-profile/seed contract:
- X-Internal-Secret header gate
- Email MUST be `@testorax-fixtures.local` or `@scope-test.testorax-fixtures.local`
- Plan defaults to `pay_as_you_go`; `studio` and `free` allowed; admin-tier explicitly REJECTED (would break the cross-tenant proof contract)
- Scopes default to `AGENT_RECOMMENDED_SCOPES` (runs:read + runs:start + issues:* + reports:read + chat:* + apps:read + proof:read); operator can override via `body.scopes` CSV
- 24h auto-expire advice (caller cleanup OR daily cron sweep)
- Returns plaintext ONCE (`apiKey: "ttx_<22>"`); same hashing pipeline as production keys
- Real cross-tenant guards apply — fixture keys can probe each other's runIds, refundIds, authSessionIds and every guard returns 404 (not 403) for unowned IDs

### 🚨 Improvement #3 — P0 cross-tenant body.email override (caught by Improvement #2)

While running the fixture key seeder against cross-tenant probes, an actual P0 vulnerability was uncovered:

> A non-admin API key bound to email `r7-fixture-a@testorax-fixtures.local` was able to submit `POST /api/runs/start` with `body.email: "<other-customer@example.com>"` and successfully mint a run owned by another customer's account. Bill-on-someone-else's-account.

The fix landed within the same session:
- Non-admin callers may NOT override `body.email`
- When `body.email` differs from `validatedEmail` on a non-admin call → `403 forbidden, code: cross_tenant_blocked`
- Admin keys retain legacy behavior (body.email wins for ops tooling)
- Same fix applied to BOTH `/api/runs/start` and `/api/runs/bypass`

**Live verified end-to-end with two real fixture keys:**
- Cross-tenant probe → `403 cross_tenant_blocked` with explanatory message ✅
- Matching email or omitted email → 200 run minted ✅

The agent's #5 ask wasn't just a deduction — it surfaced a real vulnerability we couldn't see without two real isolated keys. The proof contract now stands stronger because we can demonstrate cross-tenant isolation rather than infer it.

### Cumulative agent-API safety progression

| Round | Surface tested | Real P0/P1 found | Fixed |
|---|---|---|---|
| R1 | Initial autonomous flow | Failure intel as SPA HTML | ✅ |
| R3 | Classifier verification | Severity inflation | ✅ |
| R4 | 14 attack categories | webhookUrl SSRF (P0) | ✅ |
| R5 | Customer-data plane | 0 found (13/13 PROOF HOLDS) | — |
| R6 | Billing + concurrency | Idempotency race (P0) | ✅ |
| R7 | Stage C/D + fixture proof | **body.email cross-tenant override (P0)** | ✅ |

**Three real P0s found and fixed before launch.** Each round of adversarial testing surfaced exactly one critical issue that fixture-only testing wouldn't have caught.

## Proof Depth Telemetry
Fetch proof packet to see `proofDepthTelemetry.{visual,network,persistence}`.
- Screenshot captured is NOT visual diff
- Network summary is NOT raw HAR
- Write observed is NOT persistence verified
- Readback verified is stronger than UI-only success
- Do not claim backend persistence without readback proof

## Autonomous Config Discovery (Batch 1a — shared module)
Shared builder `buildConfigDiscovery()` from `@testorax/shared/configDiscovery` produces draft Workflow/CRUD/Campaign configs from existing evidence. API/CLI/MCP wiring ships in a follow-up batch.

Drafts only — never auto-runs.
- destructiveAllowed=false by default
- hard_delete NEVER default
- Confirm routes and selectors before running CRUD
- Provide safeMutationPolicy + cleanupPolicy before running mutation
- Use QA prefix for cleanup
- Do not claim autonomous CRUD coverage from draft config alone
