Local Artifacts
ReplayLab writes local artifacts so capture and replay can work without a hosted service.
Capsules
A capsule is a directory under .replaylab/capsules/<capsule_id>/.
It contains the captured run metadata, step records, boundary calls, and payload files needed for replay.
Same-process startup instrumentation usually writes one provider capsule when a handle.capture(...) scope exits.
That is the primary production-style integration path.
replaylab run is different because it wraps a child command.
It currently writes two kinds of capsules:
- a wrapper capsule for command metadata and exit status
- a child provider capsule for captured OpenAI,
requests, orhttpxcalls
Use the child provider capsule for replay and generated tests. The easiest way to find it is:
uv run replaylab capsule list --local-store-root .replaylab
Product Artifact Views
The local app and product API sit above these storage files. They project the local store into product objects a user can reason about:
- Captured run: the baseline run/trace the user captured.
- Regression replay: a recorded-response replay report attached to a captured run.
- Live experiment: a live-provider run created to compare current behavior against a baseline captured run.
- Generated provider replay guard: pytest evidence generated from a captured run/replay.
- Export evidence: static HTML or React viewer artifacts created for sharing or CI evidence.
Internally, live experiments may create new capsule directories. The app treats those capsules as evidence under the original captured run, not as new primary captured-run baselines. Raw capsule and report endpoints remain available for compatibility and debugging, but the normal product workflow uses captured-run ownership first.
Full-Payload Capture
Replay can inspect metadata-only capsules, but it cannot serve a provider response unless the response payload was stored. For replay and generated tests, capture with:
--capture-payload-policy full
ReplayLab stores redacted payload bytes in content-addressed files. Inspection commands summarize payload counts and hashes; they do not print payload file contents.
For startup SDK capture, configure the policy near app startup:
import replaylab
from replaylab import CapturePayloadPolicy
handle = replaylab.init(
project_name="support-bot",
auto_patch_integrations="auto",
capture_payload_policy=CapturePayloadPolicy.FULL,
)
Use explicit provider tuples such as ("openai", "anthropic", "gemini", "requests", "httpx")
when you want to limit which provider patchers ReplayLab installs.
Safe Regression Replay
Regression replay serves recorded provider responses without calling the live provider. Read operations replay directly. External writes are blocked by default unless the safe-write policy explicitly allows no-op replay for that class of boundary.
Local Viewers
For day-to-day human diagnosis, start the local app against the store root:
uv run replaylab app --local-store-root .replaylab
The app serves a localhost-only browser UI from the packaged React assets. It lists product-level
captured runs and their attached regression replays, live experiments, generated regressions, and
exports. It separates failed and clean regression replays, resolves a report's source captured run
when present, and starts with explicit shortcuts for the latest failed regression replay, latest
regression replay, and latest captured run.
When the latest report is also the latest failure, that start choice is shown once instead of
duplicated. The selected artifact shows deterministic diagnosis and action buttons before boundary
tables. ReplayLab recovers run profiles from artifact metadata when available and detects providers
from captured boundaries, so framework workflows show openai or requests as facts instead of
asking users to choose provider integrations. The app can run local actions from the selected
capsule or report: capture again, regression replay, live experiment, compare, evidence export, and
provider replay guard generation. A regression replay reruns the app while ReplayLab serves recorded
provider responses; it proves regression safety for that captured provider-boundary flow, not current
live model behavior or execution-tool/I/O safety. A live experiment is separate and explicit: it
reruns the workflow with current code, prompts, tools, framework versions, and live provider
responses after a confirmation warning. Its
workbench can apply supported variants to the run/provider request, including scenario input,
canonical provider instructions, model and sampling parameters, response format/tool choice,
response-mode intent, provider conversion previews for OpenAI, Anthropic, and Gemini, and tool
declaration descriptions or schemas. Conversion reports show mapped fields, dropped fields, changed
semantics, and the target payload; preview-only conversions can be selected for inspection while the
live run stays blocked until ReplayLab can safely preserve the app call contract. The app treats
scenario input as the primary user-input path;
direct provider user-message overrides are advanced controls. Tool definitions are shown as readable
cards and parameter summaries before raw JSON schema editing. The captured-run page keeps completed live
experiments in its Live experiments tab, including labels, hypotheses, applied variant summaries,
comparability verdicts, and links to the experiment trace. Fresh
captures write run profile metadata automatically. Older captures that lack profile metadata show
app-managed setup candidates and can save a detected setup before app-owned actions run.
Captured-run pages render a split-pane trace explorer: the trace skeleton keeps
LLM/API boundaries, assistant messages, LLM requested tool rows, and Provider protocol tool
result rows navigable while the selected detail pane shows formatted redacted Input/Output
previews. Raw JSON/text stays collapsed as replay/debug data. The Captured runs route keeps the
timestamped run list on screen, shows
regression replay count plus latest regression replay status for each captured run, opens the latest
regression replay directly,
and opens the selected run in a right-side inspector with Trace, Regression replays, and Live
experiments tabs. Mutating local app requests
require a per-server action token embedded in the localhost page. Its API returns only redacted
previews and structured ReplayLab step status, not unredacted payloads, raw headers, API keys,
source bodies, or raw child stdout/stderr.
If you have a capsule and want the same local loop in a script, use:
uv run replaylab workflow local <child_capsule_id> \
--local-store-root .replaylab \
--auto-patch-integrations auto \
--viewer-output replay-viewer.html \
-- python examples/dogfood_mvp/app.py
The guided command is the same orchestration path used by the local app. It replays the app,
compares the report, writes the React viewer, and can generate and run pytest when
--generate-test --run-generated-test is supplied.
After replay, open the richer browser-readable React viewer:
uv run replaylab report view report \
.replaylab/replays/replay_dogfood_mvp/report.json \
--capsule <child_capsule_id> \
--local-store-root .replaylab \
--output replay-viewer.html
The command writes a self-contained read-only .html file and opens it in your browser.
It embeds the local React viewer bundle plus secret-safe report data, so installed users do not
need Node to open it.
It includes report status, capsule/run IDs, provider and integration labels, summary counts,
comparison status, a top-level diagnosis, a "What to do next" section, quick filters, filterable
boundary rows, request hashes, payload availability booleans, failure groups, expected-vs-actual
mismatch details, copyable command blocks, and grouped next commands.
For failed replays, the diagnosis names the first divergence, shows expected-vs-actual call evidence,
and recommends the first corrective action before the dense tables.
It does not open or render payload files, raw headers, API keys, or source bodies.
To compare two replay attempts in the React viewer, open a report diff:
uv run replaylab report view diff \
.replaylab/replays/replay_before/report.json \
.replaylab/replays/replay_after/report.json \
--output replay-diff-viewer.html
Use replaylab report export-viewer report|diff when automation should write the same viewer files
without opening a browser.
The dependency-free static HTML fallback remains available:
uv run replaylab report export-html \
.replaylab/replays/replay_dogfood_mvp/report.json \
--capsule <child_capsule_id> \
--local-store-root .replaylab \
--output replay-report.html
For static report diffs, use:
uv run replaylab report diff-html \
.replaylab/replays/replay_before/report.json \
.replaylab/replays/replay_after/report.json \
--output replay-diff.html
Both viewer paths group improved outcomes, regressions, changed failures, and unchanged failures. The React diff viewer also includes a candidate got better/worse/same diagnosis plus filters and search for changed rows.
Generated Provider Replay Guards
Generated pytest tests call replaylab replay with the original application command.
They do not test boundary JSON in isolation.
That keeps the regression close to the code path users actually run.
They are provider replay guards: they verify recorded provider-boundary replay, not safe workflow
regression for application tools, databases, files, subprocesses, or external I/O.