Local Artifacts

ReplayLab writes local artifacts so capture and replay can work without a hosted service.

Capsules

A capsule is a directory under .replaylab/capsules/<capsule_id>/. It contains the captured run metadata, step records, boundary calls, and payload files needed for replay.

Same-process startup instrumentation usually writes one provider capsule when a handle.capture(...) scope exits. That is the primary production-style integration path.

replaylab run is different because it wraps a child command. It currently writes two kinds of capsules:

a wrapper capsule for command metadata and exit status
a child provider capsule for captured OpenAI, requests, or httpx calls

Use the child provider capsule for replay and generated tests. The easiest way to find it is:

uv run replaylab capsule list --local-store-root .replaylab

Product Artifact Views

The local app and product API sit above these storage files. They project the local store into product objects a user can reason about:

Captured run: the baseline run/trace the user captured.
Regression replay: a recorded-response replay report attached to a captured run.
Live experiment: a live-provider run created to compare current behavior against a baseline captured run.
Generated provider replay guard: pytest evidence generated from a captured run/replay.
Export evidence: static HTML or React viewer artifacts created for sharing or CI evidence.

Internally, live experiments may create new capsule directories. The app treats those capsules as evidence under the original captured run, not as new primary captured-run baselines. Raw capsule and report endpoints remain available for compatibility and debugging, but the normal product workflow uses captured-run ownership first.

Full-Payload Capture

Replay can inspect metadata-only capsules, but it cannot serve a provider response unless the response payload was stored. For replay and generated tests, capture with:

--capture-payload-policy full

ReplayLab stores redacted payload bytes in content-addressed files. Inspection commands summarize payload counts and hashes; they do not print payload file contents.

For startup SDK capture, configure the policy near app startup:

import replaylab
from replaylab import CapturePayloadPolicy

handle = replaylab.init(
    project_name="support-bot",
    auto_patch_integrations="auto",
    capture_payload_policy=CapturePayloadPolicy.FULL,
)

Use explicit provider tuples such as ("openai", "anthropic", "gemini", "requests", "httpx") when you want to limit which provider patchers ReplayLab installs.

Safe Regression Replay

Regression replay serves recorded provider responses without calling the live provider. Read operations replay directly. External writes are blocked by default unless the safe-write policy explicitly allows no-op replay for that class of boundary.

Local Viewers

For day-to-day human diagnosis, start the local app against the store root:

uv run replaylab app --local-store-root .replaylab

The app serves a localhost-only browser UI from the packaged React assets. It lists product-level captured runs and their attached regression replays, live experiments, generated regressions, and exports. It separates failed and clean regression replays, resolves a report's source captured run when present, and starts with explicit shortcuts for the latest failed regression replay, latest regression replay, and latest captured run. When the latest report is also the latest failure, that start choice is shown once instead of duplicated. The selected artifact shows deterministic diagnosis and action buttons before boundary tables. ReplayLab recovers run profiles from artifact metadata when available and detects providers from captured boundaries, so framework workflows show openai or requests as facts instead of asking users to choose provider integrations. The app can run local actions from the selected capsule or report: capture again, regression replay, live experiment, compare, evidence export, and provider replay guard generation. A regression replay reruns the app while ReplayLab serves recorded provider responses; it proves regression safety for that captured provider-boundary flow, not current live model behavior or execution-tool/I/O safety. A live experiment is separate and explicit: it reruns the workflow with current code, prompts, tools, framework versions, and live provider responses after a confirmation warning. Its workbench can apply supported variants to the run/provider request, including scenario input, canonical provider instructions, model and sampling parameters, response format/tool choice, response-mode intent, provider conversion previews for OpenAI, Anthropic, and Gemini, and tool declaration descriptions or schemas. Conversion reports show mapped fields, dropped fields, changed semantics, and the target payload; preview-only conversions can be selected for inspection while the live run stays blocked until ReplayLab can safely preserve the app call contract. The app treats scenario input as the primary user-input path; direct provider user-message overrides are advanced controls. Tool definitions are shown as readable cards and parameter summaries before raw JSON schema editing. The captured-run page keeps completed live experiments in its Live experiments tab, including labels, hypotheses, applied variant summaries, comparability verdicts, and links to the experiment trace. Fresh captures write run profile metadata automatically. Older captures that lack profile metadata show app-managed setup candidates and can save a detected setup before app-owned actions run. Captured-run pages render a split-pane trace explorer: the trace skeleton keeps LLM/API boundaries, assistant messages, LLM requested tool rows, and Provider protocol tool result rows navigable while the selected detail pane shows formatted redacted Input/Output previews. Raw JSON/text stays collapsed as replay/debug data. The Captured runs route keeps the timestamped run list on screen, shows regression replay count plus latest regression replay status for each captured run, opens the latest regression replay directly, and opens the selected run in a right-side inspector with Trace, Regression replays, and Live experiments tabs. Mutating local app requests require a per-server action token embedded in the localhost page. Its API returns only redacted previews and structured ReplayLab step status, not unredacted payloads, raw headers, API keys, source bodies, or raw child stdout/stderr.

If you have a capsule and want the same local loop in a script, use:

uv run replaylab workflow local <child_capsule_id> \
  --local-store-root .replaylab \
  --auto-patch-integrations auto \
  --viewer-output replay-viewer.html \
  -- python examples/dogfood_mvp/app.py

The guided command is the same orchestration path used by the local app. It replays the app, compares the report, writes the React viewer, and can generate and run pytest when --generate-test --run-generated-test is supplied.

After replay, open the richer browser-readable React viewer:

uv run replaylab report view report \
  .replaylab/replays/replay_dogfood_mvp/report.json \
  --capsule <child_capsule_id> \
  --local-store-root .replaylab \
  --output replay-viewer.html

The command writes a self-contained read-only .html file and opens it in your browser. It embeds the local React viewer bundle plus secret-safe report data, so installed users do not need Node to open it. It includes report status, capsule/run IDs, provider and integration labels, summary counts, comparison status, a top-level diagnosis, a "What to do next" section, quick filters, filterable boundary rows, request hashes, payload availability booleans, failure groups, expected-vs-actual mismatch details, copyable command blocks, and grouped next commands. For failed replays, the diagnosis names the first divergence, shows expected-vs-actual call evidence, and recommends the first corrective action before the dense tables. It does not open or render payload files, raw headers, API keys, or source bodies.

To compare two replay attempts in the React viewer, open a report diff:

uv run replaylab report view diff \
  .replaylab/replays/replay_before/report.json \
  .replaylab/replays/replay_after/report.json \
  --output replay-diff-viewer.html

Use replaylab report export-viewer report|diff when automation should write the same viewer files without opening a browser.

The dependency-free static HTML fallback remains available:

uv run replaylab report export-html \
  .replaylab/replays/replay_dogfood_mvp/report.json \
  --capsule <child_capsule_id> \
  --local-store-root .replaylab \
  --output replay-report.html

For static report diffs, use:

uv run replaylab report diff-html \
  .replaylab/replays/replay_before/report.json \
  .replaylab/replays/replay_after/report.json \
  --output replay-diff.html

Both viewer paths group improved outcomes, regressions, changed failures, and unchanged failures. The React diff viewer also includes a candidate got better/worse/same diagnosis plus filters and search for changed rows.

Generated Provider Replay Guards

Generated pytest tests call replaylab replay with the original application command. They do not test boundary JSON in isolation. That keeps the regression close to the code path users actually run. They are provider replay guards: they verify recorded provider-boundary replay, not safe workflow regression for application tools, databases, files, subprocesses, or external I/O.