Replay Safety

ReplayLab currently separates provider replay from full workflow safety.

Provider Replay Guard

A provider replay guard runs the application again while ReplayLab serves recorded provider responses for captured provider boundaries. This verifies that the current code path still consumes the captured provider-boundary sequence.

Provider replay guards do not prove that application tools, database writes, HTTP side effects, file writes, subprocesses, or other lower-level I/O were controlled.

Safety Preflight

The local app shows a read-only replay safety preflight for captured runs and regression replay reports. The preflight explains:

which provider boundaries ReplayLab can replay
which provider-facing model tools were declared
which tool requests came from provider protocol
which provider protocol tool results were observed
which local Python callables are likely implementation candidates when a safe app root is available
which explicitly wrapped Python tool callables actually ran through ReplayLab's SDK wrapper
which HTTP effects were observed with sanitized runtime stack origins
which model tools, candidate callables, and observed HTTP effects can be linked by source and stack evidence
which read-only future policy review items would be needed before effect control can exist
which project effect policy decisions have already been saved for those review items
which HTTP effects were allowed or blocked by opt-in project policy enforcement
which filesystem mutation and subprocess hooks were active, observed, or blocked by opt-in local-effect control
which SQLite database statements were observed, proposed for review, allowed, or blocked by opt-in database-effect control
which direct raw-socket network attempts were observed or blocked by opt-in network-effect control
which queue/pubsub enqueue or publish attempts were observed or blocked by opt-in queue/pubsub effect control
which unsupported HTTP client attempts were observed or blocked by opt-in unsupported HTTP client control
which sandbox containment evidence exists for regression replay reports
whether the current evidence satisfies the read-only safe workflow readiness gate
whether safe workflow regression is available, or which blockers keep it unavailable

For OpenAI Responses captures with tool calls, ReplayLab labels protocol evidence as LLM requested tool and Provider protocol tool result. These labels do not mean ReplayLab captured the Python callable that ran the tool.

Tool Resolution

ReplayLab can statically inspect local Python source for likely implementations of provider-visible model tools. The local app shows these rows as Tool implementation candidates.

Resolution is advisory. ReplayLab parses source with ast; it does not import or execute app code, does not infer side-effect class from names, and does not treat a high-confidence match as controlled execution. See Tool Resolution for the resolver boundary.

Execution Tool Control

Applications can opt into explicit execution-tool control evidence by wrapping a local Python callable:

import replaylab

controlled_lookup_customer = replaylab.control_tool(
    lookup_customer,
    name="lookup_customer",
    model_tool_name="lookup_customer",
)

The same wrapper is available from a handle:

controlled_lookup_customer = handle.control_tool(
    lookup_customer,
    name="lookup_customer",
    model_tool_name="lookup_customer",
)

The wrapper supports sync and async callables. It preserves the callable's return value and original exception behavior. When the callable completes or fails, ReplayLab records a TOOL boundary with secret-safe execution-control metadata: model tool id/name when supplied, callable module and qualified name, optional app-root-relative source path, line number, start/end timestamps, duration, success/failure status, and argument names only.

ReplayLab does not record argument values, return values, locals, source text, payload bodies, headers, environment values, or absolute paths. V1 is explicit only: ReplayLab does not auto-wrap resolver candidates, add decorators, add framework adapters, sandbox execution, or enforce tool policies. The local app shows this as Execution tool control, which means the callable ran through ReplayLab's wrapper. It does not mean the whole workflow is safe.

HTTP Effect Attribution

For captured requests and httpx calls, ReplayLab records sanitized runtime stack evidence on the captured HTTP boundary. The local app shows this as Observed HTTP effects, including the HTTP method, host, nearest user-code function, optional app-root-relative source path, and attribution status.

This evidence is secret-safe and intentionally narrow: no source text, locals, arguments, return values, environment values, headers, payload bodies, or absolute paths are shown in the app or API payload. Stack attribution is advisory evidence unless opt-in HTTP effect policy enforcement is enabled.

Tool Effect Map

When a capture has all three evidence layers, ReplayLab can show a read-only Tool effect map:

a provider-visible model tool
a likely local implementation candidate
an observed HTTP effect whose sanitized stack frame points to that candidate

This link is built from source paths and qualified names only. It does not inspect raw payloads, headers, locals, arguments, return values, source text, or absolute paths.

The map is still advisory. It helps reviewers understand the likely chain from model tool intent to runtime HTTP I/O, but it does not mean ReplayLab captured the Python tool call, controlled its execution, blocked the HTTP call, or enforced a safety policy.

Effect Policy Proposal

When HTTP effects are present, ReplayLab can show an Effect policy proposal in the safety preflight. A proposal item is a review prompt for future effect control: it points at the observed HTTP effect, the mapped model tool and candidate when available, and the policy decision a user would need to review later.

Policy proposals are read-only. ReplayLab does not persist a policy file, enforce the proposal, block or replay HTTP effects, sandbox execution, or treat the proposal as proof that Python execution was controlled.

Effect Policy Review

The local app can save project-scoped effect policy review decisions under .replaylab/app/effect-policies/{project_id}.json. Saved rules copy immutable evidence from the proposal item and let the user edit only the decision, review status, and a short note.

Saved effect policy is not enforcement by itself. It becomes input to opt-in HTTP effect policy control only when capture or replay is run with HTTP effect policy mode set to enforce.

HTTP Effect Control

HTTP effect policy control is opt-in and HTTP-only. Enable it with:

REPLAYLAB_HTTP_EFFECT_POLICY_MODE=enforce

or with CLI wrappers:

replaylab run --http-effect-policy-mode enforce -- ...
replaylab replay <capsule> --http-effect-policy-mode enforce -- ...
replaylab workflow local <capsule> --http-effect-policy-mode enforce -- ...

In observe mode, ReplayLab records control evidence but does not block. In enforce mode, ReplayLab checks sanitized runtime HTTP evidence against saved project policy rules before allowing the requests or httpx path to proceed. A rule allows an HTTP effect only when the saved decision is allow_observed_effect, the rule status is accepted or edited, the rule is not ambiguous, and the sanitized method, host, resource, side-effect class, and available source/qualname evidence match.

Unmatched, missing, unaccepted, or ambiguous policy evidence fails closed in enforce mode. Captured runs record a blocked HTTP boundary without response payload; regression replays report a blocked replay result instead of serving the recorded HTTP payload.

This is still not full workflow safety. HTTP control does not prove ReplayLab captured or controlled the Python tool execution, does not sandbox the app, and does not enable safe workflow regression.

Local Effect Control

Local effect control is opt-in for filesystem mutations and subprocess launches. Enable it with:

REPLAYLAB_LOCAL_EFFECT_CONTROL_MODE=enforce

or with CLI wrappers:

replaylab run --local-effect-control-mode enforce -- ...
replaylab replay <capsule> --local-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --local-effect-control-mode enforce -- ...

In observe mode, ReplayLab installs these hooks only when local_effects is explicitly requested in auto_patch_integrations. When hooks are active in observe mode, app-origin filesystem mutations and subprocess launches are recorded as secret-safe evidence and allowed. In enforce mode, child run and replay processes install the hooks automatically and fail closed before app-origin file mutations or subprocess launches proceed.

The evidence records effect kind, operation, optional app-root-relative path, subprocess executable basename, nearest user-code origin, mode, status, timestamps, and duration. It never records file contents, full command arguments, environment values, cwd absolute paths, source text, locals, return values, headers, payloads, or secrets. ReplayLab-owned .replaylab writes are treated as internal and allowed.

This is narrow local-effect control, not sandboxing. ReplayLab does not control database drivers, queues, raw sockets, native extensions, or arbitrary operating-system effects through these hooks, and it does not mock blocked effects.

SQLite Database Effect Control

Database effect control is opt-in and SQLite-only in V1. It covers standard-library sqlite3 and synchronous SQLAlchemy SQLite usage that reaches the pysqlite driver. Enable it with:

REPLAYLAB_DATABASE_EFFECT_CONTROL_MODE=enforce

or with CLI wrappers:

replaylab run --database-effect-control-mode enforce -- ...
replaylab replay <capsule> --database-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --database-effect-control-mode enforce -- ...

In observe mode, SQLite hooks install only when database_effects is explicitly requested in auto_patch_integrations. ReplayLab records statement-shape evidence and allows execution. In enforce mode, child run and replay processes install SQLite hooks automatically and fail closed before a statement runs unless it exactly matches an accepted project database policy rule.

Policy matching uses the SQLite backend, display-safe database resource, operation class, normalized SQL shape hash, and available source path plus qualified name. ReplayLab never records raw SQL, parameter values, rows, database contents, connection strings with secrets, source text, locals, arguments, return values, environment values, or absolute paths.

This is not broad database support. Non-SQLite SQLAlchemy URLs, async SQLAlchemy, aiosqlite, Postgres, MySQL, MongoDB, queues, native extensions, and sandbox guarantees remain unsupported scope blockers. Blocked database effects are not mocked.

Raw Socket Network Effect Control

Network effect control is opt-in for direct Python raw-socket escapes. Enable it with:

REPLAYLAB_NETWORK_EFFECT_CONTROL_MODE=enforce

or with CLI wrappers:

replaylab run --network-effect-control-mode enforce -- ...
replaylab replay <capsule> --network-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --network-effect-control-mode enforce -- ...

In observe mode, raw-socket hooks install only when network_effects is explicitly requested in auto_patch_integrations. ReplayLab records secret-safe evidence for direct socket connect/send attempts and allows execution. In enforce mode, child run and replay processes install the hooks automatically and fail closed before app-origin raw socket I/O proceeds.

The evidence records effect kind, operation, socket family/type/protocol labels, display-safe endpoint host/port or endpoint label, nearest user-code origin, mode, status, timestamps, and duration. It never records payload bytes, socket data, source text, locals, arguments, return values, environment values, credentials, or absolute paths.

This is raw-socket escape control, not broad network or HTTP-client support. Supported requests/httpx paths remain governed by HTTP effect policy control and are suppressed from raw-socket double-blocking. Unsupported HTTP clients have a separate escape guard described below. Native/FFI escapes and sandbox guarantees remain unsupported scope blockers. Blocked raw socket effects are not mocked.

Unsupported HTTP Client Control

Unsupported HTTP client control is opt-in for HTTP libraries that bypass ReplayLab's supported requests and httpx replay path. Enable it with:

REPLAYLAB_UNSUPPORTED_HTTP_CLIENT_CONTROL_MODE=enforce

or with CLI wrappers:

replaylab run --unsupported-http-client-control-mode enforce -- ...
replaylab replay <capsule> --unsupported-http-client-control-mode enforce -- ...
replaylab workflow local <capsule> --unsupported-http-client-control-mode enforce -- ...

In observe mode, unsupported HTTP client hooks install only when unsupported_http_clients is explicitly requested in auto_patch_integrations. ReplayLab records secret-safe evidence for urllib, urllib3, and aiohttp request attempts and allows the original API to run. In enforce mode, child run and replay processes install the hooks automatically, record an unsupported HTTP client control block, and fail closed before app-origin network I/O proceeds.

The evidence records provider label, operation, method when safely derivable, display-safe host or resource label, nearest user-code origin, mode, status, timestamps, and duration. It never records headers, request bodies, response bodies, payload bytes, auth values, query secrets, environment values, locals, source text, return values, or absolute paths.

This is an escape guard, not an HTTP replay adapter. ReplayLab does not replay or mock urllib, urllib3, or aiohttp responses in V1, and there is no allowlist policy for these clients. A workflow that depends on one of these libraries remains ineligible for safe workflow regression until it moves to supported requests/httpx capture or a future dedicated adapter exists.

Queue/PubSub Effect Control

Queue/PubSub effect control is opt-in for application attempts to enqueue or publish work to common Python queue and broker clients. Enable it with:

REPLAYLAB_QUEUE_EFFECT_CONTROL_MODE=enforce

or with CLI wrappers:

replaylab run --queue-effect-control-mode enforce -- ...
replaylab replay <capsule> --queue-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --queue-effect-control-mode enforce -- ...

In observe mode, queue hooks install only when queue_effects is explicitly requested in auto_patch_integrations. ReplayLab records secret-safe evidence for supported enqueue and publish calls and allows the original API to run. In enforce mode, child run and replay processes install the hooks automatically, record a queue-control block, and fail closed before app-origin broker I/O proceeds.

V1 covers representative synchronous enqueue/publish APIs for Celery, RQ, Dramatiq, Kombu, Pika, Kafka Python, and Confluent Kafka when those libraries are present. ReplayLab does not patch Python stdlib queue, and capture_job(..., queue_name=...) remains job execution context rather than broker I/O.

The evidence records provider label, operation, effect kind, display-safe queue/topic/routing-key label when one is safely derivable, nearest user-code origin, mode, status, timestamps, duration, and whether enforcement was active. It never records job args, kwargs, message bodies, broker URLs with credentials, headers, payloads, environment values, locals, source text, return values, or absolute paths.

This is enqueue/publish escape control, not queue replay or distributed-system safety. ReplayLab does not replay broker delivery, execute workers, inspect queue payloads, support every cloud pubsub SDK, provide queue allowlists, or mock blocked queue effects.

Sandboxed Replay Runtime

Sandboxed replay is opt-in and report-only in V1. Enable it with:

REPLAYLAB_SANDBOX_MODE=enforce

Prepare the default local runtime image before running sandboxed replay:

replaylab sandbox build-image --app-root .
replaylab sandbox doctor --app-root .

The doctor command reports structured setup checks such as Docker CLI availability, Docker daemon availability, local image presence, and the hardened no-network import smoke. When a check fails, CLI JSON, human output, and local app action results show a sanitized next action such as building the image, starting Docker, fixing the recipe, or using a custom --sandbox-image.

Projects with local package dependencies can add a bounded image recipe in pyproject.toml:

[tool.replaylab.sandbox]
image = "replaylab-sandbox-runtime:py3.13"
include_paths = ["packages/my_local_dependency"]
requirements_files = ["requirements.txt"]
apt_packages = ["libpq-dev"]

You can also pass --recipe path/to/sandbox.toml. Recipe paths must be app-root relative, apt_packages are package names only, and private index values can be passed only through BuildKit secrets for known pip/uv index environment variables. ReplayLab never stores or displays those secret values.

Then run replay with CLI options:

replaylab replay <capsule> \
  --sandbox-mode enforce \
  --sandbox-backend local_container \
  --sandbox-image replaylab-sandbox-runtime:py3.13 \
  --sandbox-timeout-seconds 120 \
  -- ...

The V1 backend is local Docker container isolation. ReplayLab copies the recovered app workspace, the local ReplayLab store, the source capsule, child-bootstrap code, and ReplayLab source roots into a temporary workspace, then starts Docker as numeric user 65532:65532 with deny-all network, read-only root filesystem, split read-only input mounts, a writable copied store/report output mount, dropped Linux capabilities, no-new-privileges, process, memory, and CPU limits, bounded tmpfs /tmp, and no host Docker socket. Only ReplayLab runtime environment variables plus minimal runtime values such as PATH, HOME, and UV_CACHE_DIR are passed through.

Sandbox evidence is secret-safe: ReplayLab records mode, backend, image label, image id when available, runtime user, read-only root filesystem status, workspace mount policy, recipe source, recipe hash, deny-all network policy, copied-workspace filesystem policy, timeout, exit code, cleanup status, and a short message. It does not record environment values, absolute host paths, file contents, payloads, headers, source text, locals, arguments, or return values.

This is containment, not effect mocking or a replacement for the existing control chain. HTTP, local, SQLite, raw-socket, queue/pubsub, and unsupported HTTP client controls still decide what is allowed or blocked. The sandbox contains the replay process so missed effects cannot inherit host secrets, mutate the host app workspace, or use the host network by default.

The local Docker image must already be available and contain the runtime dependencies needed by the app and ReplayLab because V1 replay runs with --network none and --pull never. The image builder may pull the base image and install dependencies during setup; the replay container does not. It installs ReplayLab's runtime packages into the image, detects app dependencies from uv.lock plus pyproject.toml, requirements.txt, a bounded sandbox recipe, or a ReplayLab-only app, and refuses local path dependencies unless a recipe includes the needed app-root-relative package paths. Docker missing, image missing, image startup failure, timeout, cleanup failure, or older non-hardened sandbox evidence is surfaced as sandbox evidence and blocks safe workflow readiness.

ReplayLab also keeps an adversarial sandbox scenario for developer validation. It checks bounded escape probes such as external symlinks, absolute host-path command arguments, inherited environment markers, Docker socket visibility, deny-network raw sockets, read-only app/root filesystem writes, writable /tmp, and linked process-escape source evidence. Passing that scenario means these probes are refused, contained, or reported as readiness blockers; it is still not a VM, microVM, or managed hosted sandbox guarantee.

Daytona and other managed hosted sandbox providers are not part of V1. The sandbox contract is backend-shaped so a future hosted implementation can provide equivalent evidence without becoming foundational to the local SDK.

Unsupported Effect Scope

The safety preflight also scans the recovered local app root for unsupported effect surfaces without importing or executing user code. V1 detects representative database clients, queues/pubsub libraries, raw sockets, unsupported HTTP clients, native/FFI escapes, and cross-process escape APIs such as multiprocessing, ProcessPoolExecutor, os.fork, os.exec*, os.spawn*, os.posix_spawn*, and pty.spawn. Plain import os is not treated as a blocker by itself. It also flags captured or replayed boundary kinds that are outside the current controlled provider, HTTP, execution-tool, and local-effect chain. SQLite code with matching database-control evidence is shown as supported scope evidence instead of blocking readiness. Direct raw-socket code with enforced network-effect control evidence is also shown as controlled scope evidence instead of silently escaping coverage. Supported queue/pubsub imports and calls are shown as controlled scope evidence only when queue-effect enforcement hooks were active and no app-origin queue effects were observed or blocked. Linked urllib, urllib3, and aiohttp evidence is shown as controlled scope evidence only when unsupported HTTP client enforcement hooks were active and no app-origin unsupported HTTP attempts were observed or blocked.

Detection is scope evidence, not enforcement. Native/FFI and process-escape findings are not sandboxed or blocked by this guard; they make safe workflow generation unavailable when linked to the workflow scope. Linked evidence blocks safe workflow readiness only when it appears in the current workflow path: the resolved model-tool candidate source, explicit execution-tool wrapper source, or primary HTTP stack source. Unsupported imports elsewhere in the project appear as informational warnings so ReplayLab does not silently overclaim, but they do not block the specific workflow by themselves.

For report-derived safe workflow generation, the unsupported-effect scan must complete. If ReplayLab cannot recover an app root or scan limits are reached, generation stays unavailable because the workflow scope was not fully checked.

Safe Workflow Readiness

The safety preflight now includes a Safe workflow readiness gate. The gate summarizes whether ReplayLab has enough controlled evidence for safe workflow regression generation and lists each requirement as satisfied, blocked, unknown, or not applicable.

V1 remains conservative. Provider replay, model tool visibility, implementation candidates, explicit execution-tool wrapper evidence, tool effect maps, saved policy review, opt-in HTTP enforcement, local-effect enforcement, and SQLite database-effect enforcement when SQLite statements exist must all be satisfied for generation. Network-effect enforcement must also be active so direct raw-socket escapes fail closed. Queue/PubSub enforcement must also be active so enqueue/publish escapes fail closed. Unsupported HTTP client enforcement must also be active and clean so urllib/urllib3/aiohttp escapes cannot bypass supported HTTP policy control. Report-derived safe workflow generation also requires completed local-container sandbox evidence with deny-all network, non-root execution, read-only root filesystem, split read-only/writable mounts, copied-workspace filesystem isolation, and successful cleanup. Older sandbox reports remain inspectable but must be rerun through the hardened runtime before generation. Captured-run views and incomplete reports stay unavailable. A report preflight reaches ready only when the controlled evidence chain is complete, hardened sandbox containment completed, unsupported-effect scope detection is clear, and there are no blocked effects.

Readiness is still a gate. Linked native/FFI or process-escape evidence keeps a report at not_ready because those paths can bypass the current monkey-patched controls. ready_but_generation_disabled means the evidence is useful, but that artifact is not a supported generation source. ready sets can_generate_safe_workflow_regression=true and enables report-driven generation.

Safe Workflow Regression

Safe workflow regression is report-driven. The generated pytest copies the source capsule fixture and reviewed project effect policy fixture, installs the policy into a temporary .replaylab store, and reruns replaylab replay with HTTP effect policy enforcement, local-effect control enforcement, SQLite database-effect control enforcement, raw-socket network-effect control enforcement, and queue/pubsub effect control enforcement enabled. It also runs with unsupported HTTP client control enforcement enabled and local-container sandbox mode enforce. Generation is refused if the source report does not already include completed sandbox evidence, if unsupported-effect scope detection is limited, or if it finds blocking linked native/FFI, process-escape, or other unsupported evidence. The test fails if readiness drops below ready, if sandbox containment fails, if HTTP, local, SQLite database, raw socket network, queue/pubsub, or unsupported HTTP client effects are blocked, or if replay introduces blocked, mismatched, extra, missing, or payload-unavailable rows.

Provider replay guards remain available for provider-boundary regression checks. Diagnostic provider replay guards remain available for known failure shapes. Safe workflow regression does not add mocking, framework adapters, managed hosted execution, VM or microVM guarantees, or support for broad database backends, broker delivery, worker execution, unsupported queue/pubsub SDKs, unsupported HTTP client response contracts, native/FFI escapes, cross-process escape APIs, or arbitrary operating-system effects beyond the SQLite statement-shape, raw-socket escape, enqueue/publish, unsupported HTTP client escape, and local-container containment controls described above.