Replay Safety
ReplayLab currently separates provider replay from full workflow safety.
Provider Replay Guard
A provider replay guard runs the application again while ReplayLab serves recorded provider responses for captured provider boundaries. This verifies that the current code path still consumes the captured provider-boundary sequence.
Provider replay guards do not prove that application tools, database writes, HTTP side effects, file writes, subprocesses, or other lower-level I/O were controlled.
Safety Preflight
The local app shows a read-only replay safety preflight for captured runs and regression replay reports. The preflight explains:
- which provider boundaries ReplayLab can replay
- which provider-facing model tools were declared
- which tool requests came from provider protocol
- which provider protocol tool results were observed
- which local Python callables are likely implementation candidates when a safe app root is available
- which explicitly wrapped Python tool callables actually ran through ReplayLab's SDK wrapper
- which HTTP effects were observed with sanitized runtime stack origins
- which model tools, candidate callables, and observed HTTP effects can be linked by source and stack evidence
- which read-only future policy review items would be needed before effect control can exist
- which project effect policy decisions have already been saved for those review items
- which HTTP effects were allowed or blocked by opt-in project policy enforcement
- which filesystem mutation and subprocess hooks were active, observed, or blocked by opt-in local-effect control
- which SQLite database statements were observed, proposed for review, allowed, or blocked by opt-in database-effect control
- which direct raw-socket network attempts were observed or blocked by opt-in network-effect control
- which queue/pubsub enqueue or publish attempts were observed or blocked by opt-in queue/pubsub effect control
- which unsupported HTTP client attempts were observed or blocked by opt-in unsupported HTTP client control
- which sandbox containment evidence exists for regression replay reports
- whether the current evidence satisfies the read-only safe workflow readiness gate
- whether safe workflow regression is available, or which blockers keep it unavailable
For OpenAI Responses captures with tool calls, ReplayLab labels protocol evidence as LLM requested
tool and Provider protocol tool result. These labels do not mean ReplayLab captured the Python
callable that ran the tool.
Tool Resolution
ReplayLab can statically inspect local Python source for likely implementations of provider-visible
model tools. The local app shows these rows as Tool implementation candidates.
Resolution is advisory. ReplayLab parses source with ast; it does not import or execute app code,
does not infer side-effect class from names, and does not treat a high-confidence match as controlled
execution. See Tool Resolution for the resolver boundary.
Execution Tool Control
Applications can opt into explicit execution-tool control evidence by wrapping a local Python callable:
import replaylab
controlled_lookup_customer = replaylab.control_tool(
lookup_customer,
name="lookup_customer",
model_tool_name="lookup_customer",
)
The same wrapper is available from a handle:
controlled_lookup_customer = handle.control_tool(
lookup_customer,
name="lookup_customer",
model_tool_name="lookup_customer",
)
The wrapper supports sync and async callables. It preserves the callable's return value and original
exception behavior. When the callable completes or fails, ReplayLab records a TOOL boundary with
secret-safe execution-control metadata: model tool id/name when supplied, callable module and
qualified name, optional app-root-relative source path, line number, start/end timestamps, duration,
success/failure status, and argument names only.
ReplayLab does not record argument values, return values, locals, source text, payload bodies,
headers, environment values, or absolute paths. V1 is explicit only: ReplayLab does not auto-wrap
resolver candidates, add decorators, add framework adapters, sandbox execution, or enforce tool
policies. The local app shows this as Execution tool control, which means the callable ran through
ReplayLab's wrapper. It does not mean the whole workflow is safe.
HTTP Effect Attribution
For captured requests and httpx calls, ReplayLab records sanitized runtime stack evidence on the
captured HTTP boundary. The local app shows this as Observed HTTP effects, including the HTTP
method, host, nearest user-code function, optional app-root-relative source path, and attribution
status.
This evidence is secret-safe and intentionally narrow: no source text, locals, arguments, return values, environment values, headers, payload bodies, or absolute paths are shown in the app or API payload. Stack attribution is advisory evidence unless opt-in HTTP effect policy enforcement is enabled.
Tool Effect Map
When a capture has all three evidence layers, ReplayLab can show a read-only Tool effect map:
- a provider-visible model tool
- a likely local implementation candidate
- an observed HTTP effect whose sanitized stack frame points to that candidate
This link is built from source paths and qualified names only. It does not inspect raw payloads, headers, locals, arguments, return values, source text, or absolute paths.
The map is still advisory. It helps reviewers understand the likely chain from model tool intent to runtime HTTP I/O, but it does not mean ReplayLab captured the Python tool call, controlled its execution, blocked the HTTP call, or enforced a safety policy.
Effect Policy Proposal
When HTTP effects are present, ReplayLab can show an Effect policy proposal in the safety
preflight. A proposal item is a review prompt for future effect control: it points at the observed
HTTP effect, the mapped model tool and candidate when available, and the policy decision a user
would need to review later.
Policy proposals are read-only. ReplayLab does not persist a policy file, enforce the proposal, block or replay HTTP effects, sandbox execution, or treat the proposal as proof that Python execution was controlled.
Effect Policy Review
The local app can save project-scoped effect policy review decisions under
.replaylab/app/effect-policies/{project_id}.json. Saved rules copy immutable evidence from the
proposal item and let the user edit only the decision, review status, and a short note.
Saved effect policy is not enforcement by itself. It becomes input to opt-in HTTP effect policy
control only when capture or replay is run with HTTP effect policy mode set to enforce.
HTTP Effect Control
HTTP effect policy control is opt-in and HTTP-only. Enable it with:
REPLAYLAB_HTTP_EFFECT_POLICY_MODE=enforce
or with CLI wrappers:
replaylab run --http-effect-policy-mode enforce -- ...
replaylab replay <capsule> --http-effect-policy-mode enforce -- ...
replaylab workflow local <capsule> --http-effect-policy-mode enforce -- ...
In observe mode, ReplayLab records control evidence but does not block. In enforce mode,
ReplayLab checks sanitized runtime HTTP evidence against saved project policy rules before allowing
the requests or httpx path to proceed. A rule allows an HTTP effect only when the saved decision
is allow_observed_effect, the rule status is accepted or edited, the rule is not ambiguous,
and the sanitized method, host, resource, side-effect class, and available source/qualname evidence
match.
Unmatched, missing, unaccepted, or ambiguous policy evidence fails closed in enforce mode.
Captured runs record a blocked HTTP boundary without response payload; regression replays report a
blocked replay result instead of serving the recorded HTTP payload.
This is still not full workflow safety. HTTP control does not prove ReplayLab captured or controlled the Python tool execution, does not sandbox the app, and does not enable safe workflow regression.
Local Effect Control
Local effect control is opt-in for filesystem mutations and subprocess launches. Enable it with:
REPLAYLAB_LOCAL_EFFECT_CONTROL_MODE=enforce
or with CLI wrappers:
replaylab run --local-effect-control-mode enforce -- ...
replaylab replay <capsule> --local-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --local-effect-control-mode enforce -- ...
In observe mode, ReplayLab installs these hooks only when local_effects is explicitly requested
in auto_patch_integrations. When hooks are active in observe mode, app-origin filesystem
mutations and subprocess launches are recorded as secret-safe evidence and allowed. In enforce
mode, child run and replay processes install the hooks automatically and fail closed before
app-origin file mutations or subprocess launches proceed.
The evidence records effect kind, operation, optional app-root-relative path, subprocess executable
basename, nearest user-code origin, mode, status, timestamps, and duration. It never records file
contents, full command arguments, environment values, cwd absolute paths, source text, locals,
return values, headers, payloads, or secrets. ReplayLab-owned .replaylab writes are treated as
internal and allowed.
This is narrow local-effect control, not sandboxing. ReplayLab does not control database drivers, queues, raw sockets, native extensions, or arbitrary operating-system effects through these hooks, and it does not mock blocked effects.
SQLite Database Effect Control
Database effect control is opt-in and SQLite-only in V1. It covers standard-library sqlite3 and
synchronous SQLAlchemy SQLite usage that reaches the pysqlite driver. Enable it with:
REPLAYLAB_DATABASE_EFFECT_CONTROL_MODE=enforce
or with CLI wrappers:
replaylab run --database-effect-control-mode enforce -- ...
replaylab replay <capsule> --database-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --database-effect-control-mode enforce -- ...
In observe mode, SQLite hooks install only when database_effects is explicitly requested in
auto_patch_integrations. ReplayLab records statement-shape evidence and allows execution. In
enforce mode, child run and replay processes install SQLite hooks automatically and fail
closed before a statement runs unless it exactly matches an accepted project database policy rule.
Policy matching uses the SQLite backend, display-safe database resource, operation class, normalized SQL shape hash, and available source path plus qualified name. ReplayLab never records raw SQL, parameter values, rows, database contents, connection strings with secrets, source text, locals, arguments, return values, environment values, or absolute paths.
This is not broad database support. Non-SQLite SQLAlchemy URLs, async SQLAlchemy, aiosqlite,
Postgres, MySQL, MongoDB, queues, native extensions, and sandbox guarantees remain unsupported
scope blockers. Blocked database effects are not mocked.
Raw Socket Network Effect Control
Network effect control is opt-in for direct Python raw-socket escapes. Enable it with:
REPLAYLAB_NETWORK_EFFECT_CONTROL_MODE=enforce
or with CLI wrappers:
replaylab run --network-effect-control-mode enforce -- ...
replaylab replay <capsule> --network-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --network-effect-control-mode enforce -- ...
In observe mode, raw-socket hooks install only when network_effects is explicitly requested in
auto_patch_integrations. ReplayLab records secret-safe evidence for direct socket connect/send
attempts and allows execution. In enforce mode, child run and replay processes install the
hooks automatically and fail closed before app-origin raw socket I/O proceeds.
The evidence records effect kind, operation, socket family/type/protocol labels, display-safe endpoint host/port or endpoint label, nearest user-code origin, mode, status, timestamps, and duration. It never records payload bytes, socket data, source text, locals, arguments, return values, environment values, credentials, or absolute paths.
This is raw-socket escape control, not broad network or HTTP-client support. Supported
requests/httpx paths remain governed by HTTP effect policy control and are suppressed from
raw-socket double-blocking. Unsupported HTTP clients have a separate escape guard described below.
Native/FFI escapes and sandbox guarantees remain unsupported scope blockers. Blocked raw socket
effects are not mocked.
Unsupported HTTP Client Control
Unsupported HTTP client control is opt-in for HTTP libraries that bypass ReplayLab's supported
requests and httpx replay path. Enable it with:
REPLAYLAB_UNSUPPORTED_HTTP_CLIENT_CONTROL_MODE=enforce
or with CLI wrappers:
replaylab run --unsupported-http-client-control-mode enforce -- ...
replaylab replay <capsule> --unsupported-http-client-control-mode enforce -- ...
replaylab workflow local <capsule> --unsupported-http-client-control-mode enforce -- ...
In observe mode, unsupported HTTP client hooks install only when unsupported_http_clients is
explicitly requested in auto_patch_integrations. ReplayLab records secret-safe evidence for
urllib, urllib3, and aiohttp request attempts and allows the original API to run. In
enforce mode, child run and replay processes install the hooks automatically, record an
unsupported HTTP client control block, and fail closed before app-origin network I/O proceeds.
The evidence records provider label, operation, method when safely derivable, display-safe host or resource label, nearest user-code origin, mode, status, timestamps, and duration. It never records headers, request bodies, response bodies, payload bytes, auth values, query secrets, environment values, locals, source text, return values, or absolute paths.
This is an escape guard, not an HTTP replay adapter. ReplayLab does not replay or mock urllib,
urllib3, or aiohttp responses in V1, and there is no allowlist policy for these clients. A
workflow that depends on one of these libraries remains ineligible for safe workflow regression
until it moves to supported requests/httpx capture or a future dedicated adapter exists.
Queue/PubSub Effect Control
Queue/PubSub effect control is opt-in for application attempts to enqueue or publish work to common Python queue and broker clients. Enable it with:
REPLAYLAB_QUEUE_EFFECT_CONTROL_MODE=enforce
or with CLI wrappers:
replaylab run --queue-effect-control-mode enforce -- ...
replaylab replay <capsule> --queue-effect-control-mode enforce -- ...
replaylab workflow local <capsule> --queue-effect-control-mode enforce -- ...
In observe mode, queue hooks install only when queue_effects is explicitly requested in
auto_patch_integrations. ReplayLab records secret-safe evidence for supported enqueue and publish
calls and allows the original API to run. In enforce mode, child run and replay processes
install the hooks automatically, record a queue-control block, and fail closed before app-origin
broker I/O proceeds.
V1 covers representative synchronous enqueue/publish APIs for Celery, RQ, Dramatiq, Kombu, Pika,
Kafka Python, and Confluent Kafka when those libraries are present. ReplayLab does not patch Python
stdlib queue, and capture_job(..., queue_name=...) remains job execution context rather than
broker I/O.
The evidence records provider label, operation, effect kind, display-safe queue/topic/routing-key label when one is safely derivable, nearest user-code origin, mode, status, timestamps, duration, and whether enforcement was active. It never records job args, kwargs, message bodies, broker URLs with credentials, headers, payloads, environment values, locals, source text, return values, or absolute paths.
This is enqueue/publish escape control, not queue replay or distributed-system safety. ReplayLab does not replay broker delivery, execute workers, inspect queue payloads, support every cloud pubsub SDK, provide queue allowlists, or mock blocked queue effects.
Sandboxed Replay Runtime
Sandboxed replay is opt-in and report-only in V1. Enable it with:
REPLAYLAB_SANDBOX_MODE=enforce
Prepare the default local runtime image before running sandboxed replay:
replaylab sandbox build-image --app-root .
replaylab sandbox doctor --app-root .
The doctor command reports structured setup checks such as Docker CLI availability, Docker daemon
availability, local image presence, and the hardened no-network import smoke. When a check fails,
CLI JSON, human output, and local app action results show a sanitized next action such as building
the image, starting Docker, fixing the recipe, or using a custom --sandbox-image.
Projects with local package dependencies can add a bounded image recipe in pyproject.toml:
[tool.replaylab.sandbox]
image = "replaylab-sandbox-runtime:py3.13"
include_paths = ["packages/my_local_dependency"]
requirements_files = ["requirements.txt"]
apt_packages = ["libpq-dev"]
You can also pass --recipe path/to/sandbox.toml. Recipe paths must be app-root relative,
apt_packages are package names only, and private index values can be passed only through
BuildKit secrets for known pip/uv index environment variables. ReplayLab never stores or displays
those secret values.
Then run replay with CLI options:
replaylab replay <capsule> \
--sandbox-mode enforce \
--sandbox-backend local_container \
--sandbox-image replaylab-sandbox-runtime:py3.13 \
--sandbox-timeout-seconds 120 \
-- ...
The V1 backend is local Docker container isolation. ReplayLab copies the recovered app workspace,
the local ReplayLab store, the source capsule, child-bootstrap code, and ReplayLab source roots into
a temporary workspace, then starts Docker as numeric user 65532:65532 with deny-all network,
read-only root filesystem, split read-only input mounts, a writable copied store/report output
mount, dropped Linux capabilities, no-new-privileges, process, memory, and CPU limits, bounded
tmpfs /tmp, and no host Docker socket.
Only ReplayLab runtime environment variables plus minimal runtime values such as PATH, HOME,
and UV_CACHE_DIR are passed through.
Sandbox evidence is secret-safe: ReplayLab records mode, backend, image label, image id when available, runtime user, read-only root filesystem status, workspace mount policy, recipe source, recipe hash, deny-all network policy, copied-workspace filesystem policy, timeout, exit code, cleanup status, and a short message. It does not record environment values, absolute host paths, file contents, payloads, headers, source text, locals, arguments, or return values.
This is containment, not effect mocking or a replacement for the existing control chain. HTTP, local, SQLite, raw-socket, queue/pubsub, and unsupported HTTP client controls still decide what is allowed or blocked. The sandbox contains the replay process so missed effects cannot inherit host secrets, mutate the host app workspace, or use the host network by default.
The local Docker image must already be available and contain the runtime dependencies needed by the
app and ReplayLab because V1 replay runs with --network none and --pull never. The image builder
may pull the base image and install dependencies during setup; the replay container does not. It
installs ReplayLab's runtime packages into the image, detects app dependencies from uv.lock plus
pyproject.toml, requirements.txt, a bounded sandbox recipe, or a ReplayLab-only app, and refuses
local path dependencies unless a recipe includes the needed app-root-relative package paths. Docker
missing, image missing, image startup failure, timeout, cleanup failure, or older non-hardened
sandbox evidence is surfaced as sandbox evidence and blocks safe workflow readiness.
ReplayLab also keeps an adversarial sandbox scenario for developer validation. It checks bounded
escape probes such as external symlinks, absolute host-path command arguments, inherited
environment markers, Docker socket visibility, deny-network raw sockets, read-only app/root
filesystem writes, writable /tmp, and linked process-escape source evidence. Passing that
scenario means these probes are refused, contained, or reported as readiness blockers; it is still
not a VM, microVM, or managed hosted sandbox guarantee.
Daytona and other managed hosted sandbox providers are not part of V1. The sandbox contract is backend-shaped so a future hosted implementation can provide equivalent evidence without becoming foundational to the local SDK.
Unsupported Effect Scope
The safety preflight also scans the recovered local app root for unsupported effect surfaces without
importing or executing user code. V1 detects representative database clients, queues/pubsub
libraries, raw sockets, unsupported HTTP clients, native/FFI escapes, and cross-process escape APIs
such as multiprocessing, ProcessPoolExecutor, os.fork, os.exec*, os.spawn*,
os.posix_spawn*, and pty.spawn. Plain import os is not treated as a blocker by itself. It also
flags captured or replayed boundary kinds that are outside the current controlled provider, HTTP,
execution-tool, and local-effect chain. SQLite code with matching database-control evidence is
shown as supported scope evidence instead of blocking readiness. Direct raw-socket code with
enforced network-effect control evidence is also shown as controlled scope evidence instead of
silently escaping coverage.
Supported queue/pubsub imports and calls are shown as controlled scope evidence only when
queue-effect enforcement hooks were active and no app-origin queue effects were observed or blocked.
Linked urllib, urllib3, and aiohttp evidence is shown as controlled scope evidence only when
unsupported HTTP client enforcement hooks were active and no app-origin unsupported HTTP attempts
were observed or blocked.
Detection is scope evidence, not enforcement. Native/FFI and process-escape findings are not sandboxed or blocked by this guard; they make safe workflow generation unavailable when linked to the workflow scope. Linked evidence blocks safe workflow readiness only when it appears in the current workflow path: the resolved model-tool candidate source, explicit execution-tool wrapper source, or primary HTTP stack source. Unsupported imports elsewhere in the project appear as informational warnings so ReplayLab does not silently overclaim, but they do not block the specific workflow by themselves.
For report-derived safe workflow generation, the unsupported-effect scan must complete. If ReplayLab cannot recover an app root or scan limits are reached, generation stays unavailable because the workflow scope was not fully checked.
Safe Workflow Readiness
The safety preflight now includes a Safe workflow readiness gate. The gate summarizes whether
ReplayLab has enough controlled evidence for safe workflow regression generation and lists each
requirement as satisfied, blocked, unknown, or not applicable.
V1 remains conservative. Provider replay, model tool visibility, implementation candidates,
explicit execution-tool wrapper evidence, tool effect maps, saved policy review, opt-in HTTP
enforcement, local-effect enforcement, and SQLite database-effect enforcement when SQLite statements
exist must all be satisfied for generation. Network-effect enforcement must also be active so direct
raw-socket escapes fail closed. Queue/PubSub enforcement must also be active so enqueue/publish
escapes fail closed. Unsupported HTTP client enforcement must also be active and clean so
urllib/urllib3/aiohttp escapes cannot bypass supported HTTP policy control. Report-derived
safe workflow generation also requires completed local-container sandbox evidence with deny-all
network, non-root execution, read-only root filesystem, split read-only/writable mounts,
copied-workspace filesystem isolation, and successful cleanup. Older sandbox reports remain
inspectable but must be rerun through the hardened runtime before generation. Captured-run views
and incomplete reports stay unavailable. A report preflight reaches ready only when the
controlled evidence chain is complete, hardened sandbox containment completed, unsupported-effect
scope detection is clear, and there are no blocked effects.
Readiness is still a gate. Linked native/FFI or process-escape evidence keeps a report at
not_ready because those paths can bypass the current monkey-patched controls.
ready_but_generation_disabled means the evidence is useful, but that artifact is not a supported
generation source. ready sets
can_generate_safe_workflow_regression=true and enables report-driven generation.
Safe Workflow Regression
Safe workflow regression is report-driven. The generated pytest copies the source capsule fixture
and reviewed project effect policy fixture, installs the policy into a temporary .replaylab store,
and reruns replaylab replay with HTTP effect policy enforcement, local-effect control enforcement,
SQLite database-effect control enforcement, raw-socket network-effect control enforcement, and
queue/pubsub effect control enforcement enabled. It also runs with unsupported HTTP client control
enforcement enabled and local-container sandbox mode enforce. Generation is refused if the source
report does not already include completed sandbox evidence, if unsupported-effect scope detection is
limited, or if it finds blocking linked native/FFI, process-escape, or other unsupported evidence.
The test fails if readiness drops below ready, if sandbox containment fails, if HTTP, local,
SQLite database, raw socket network, queue/pubsub, or unsupported HTTP client effects are blocked,
or if replay introduces blocked, mismatched, extra, missing, or payload-unavailable rows.
Provider replay guards remain available for provider-boundary regression checks. Diagnostic provider replay guards remain available for known failure shapes. Safe workflow regression does not add mocking, framework adapters, managed hosted execution, VM or microVM guarantees, or support for broad database backends, broker delivery, worker execution, unsupported queue/pubsub SDKs, unsupported HTTP client response contracts, native/FFI escapes, cross-process escape APIs, or arbitrary operating-system effects beyond the SQLite statement-shape, raw-socket escape, enqueue/publish, unsupported HTTP client escape, and local-container containment controls described above.