ReplayLab

ReplayLab turns real agent failures into deterministic local replays and reusable provider replay guards.

Agent systems are hard to debug because the important behavior crosses nondeterministic boundaries: LLMs, HTTP APIs, tools, queues, files, and databases. ReplayLab captures those boundaries into local capsules so the same application command can be replayed without live provider calls and converted into a pytest provider replay guard.

The current local MVP supports Python apps using OpenAI Responses, Anthropic Messages, Gemini Generate Content, sync requests, sync httpx, and async httpx. Core OpenAI, Anthropic, and Gemini text streams are captured as event-preserving replayable LLM evidence when the stream is fully consumed:

capture -> choose regression replay or live experiment -> compare -> diagnose -> generate regression -> re-run

Start Here

Why ReplayLab?: understand the problem and product loop.
Tutorials: learn with real-provider walkthroughs, ASGI middleware, and matching notebooks.
Quickstart: run the deterministic dogfood workflow end to end.
Integration Model: understand startup instrumentation, capture scopes, replay mode, and replaylab run.
Local Artifacts: understand capsules, full-payload capture, safe replay, and generated tests.
Support Matrix: see the public-alpha supported provider and replay surface.
Documentation Coverage: see how public capabilities map to docs, examples, and tests.
MVP Limitations: understand what is supported and what is intentionally out of scope.
Release Checklist: verify published alpha installs and future release candidates.
ADRs: architecture decisions captured so far.

For the public-alpha learning path, start with the OpenAI Responses tutorial, the HTTP tutorial, the ASGI/FastAPI middleware tutorial, or the worker job tutorial. If your app uses an agent framework, use the PydanticAI compatibility tutorial, LangGraph compatibility tutorial, LangChain compatibility tutorial, OpenAI Agents SDK compatibility tutorial, LlamaIndex compatibility tutorial, or CrewAI compatibility tutorial to see the current validated provider-level workflow before assuming a framework-specific adapter is needed. After capture or replay, use replaylab app --local-store-root .replaylab as the default human UI. It starts a localhost-only local app and projects the local store into product artifacts: captured runs with attached regression replays, live experiments, provider replay guards, and exports. Internal capsules and reports remain compatibility/debug details. The app resolves report/captured run pairs automatically, shows the first diagnosis before dense boundary tables, and can run regression replay, live experiment, compare, export, and optional pytest generation from the selected product object. Use replaylab workflow local when you want the same replay/compare/viewer/pytest chain from a script. If you need a shareable single file, open the Local React Viewer, or use AI-Assisted Diagnosis to get optional BYOK explanations and instrumentation plans from secret-safe summaries. For existing applications, the intended integration model is startup-level SDK instrumentation: initialize ReplayLab once near app startup, configure provider auto-patching, and keep normal provider client code. The CLI wrapper remains a local, CI, and provider replay guard tool rather than a required production deployment command. Maintainer scenarios are the development feedback loop for this model; the richer ASGI and job lifecycle scenarios validate ignored/provider-free work, safe metadata, replay, comparison, and generated provider replay guards before those workflows are treated as release-ready. Framework compatibility scenarios now do the same for PydanticAI, LangChain, LangGraph, OpenAI Agents SDK, LlamaIndex, and CrewAI without adding those frameworks as ReplayLab dependencies, and native provider scenarios do the same for Anthropic Messages, Gemini generate_content, and core provider streaming.

import replaylab
from replaylab import CapturePayloadPolicy

handle = replaylab.init(
    project_name="support-bot",
    auto_patch_integrations="auto",
    capture_payload_policy=CapturePayloadPolicy.FULL,
)

"auto" enables all supported provider and framework patchers in stable order: OpenAI, Anthropic, Gemini, requests, httpx, PydanticAI tool dispatch, LangChain tool dispatch, LangGraph ToolNode dispatch, OpenAI Agents SDK function-tool dispatch, LlamaIndex FunctionTool dispatch, and CrewAI custom-tool dispatch. Use explicit tuples such as ("openai", "langchain"), ("openai", "langchain", "langgraph"), ("openai", "openai_agents"), ("openai", "llama_index"), or ("openai", "crewai") when you want a smaller production patch surface.

For request, job, or session boundaries, use a capture scope around the work you want grouped into one capsule. Provider calls inside the scope are captured by the startup instrumentation and attach to the scope's default step.

with handle.capture(
    "classify_ticket",
    session_id=ticket_id,
    labels=("request",),
    runtime_metadata={"ticket.id": ticket_id},
) as capture:
    classify_ticket(ticket)

capsule = capture.capsule

For ASGI apps such as FastAPI or Starlette, instrument the app once so ReplayLab opens that capture scope automatically for each request with provider work:

replaylab.instrument_app(app, handle=handle, ignored_paths=("/health",))

For background workers, decorate the job entrypoint so each job invocation becomes one capture scope:

import requests
from replaylab.integrations.jobs import capture_job


@capture_job(handle=handle, name="sync_ticket", session_id_arg="ticket_id")
def sync_ticket(ticket_id: str) -> dict[str, str]:
    response = requests.get(f"https://support.example.test/tickets/{ticket_id}", timeout=5)
    response.raise_for_status()
    return {"ticket_id": ticket_id}

There is no background listener in this flow. ReplayLab records provider calls because init(...) installs provider wrappers in the current Python process. When running under replaylab replay, the app keeps the same startup initialization. ReplayLab treats capture scopes as no-ops in replay mode while the CLI-owned replay runtime serves provider calls from the capsule.

If you do not have provider credentials, use the deterministic no-network dogfood app. This fallback intentionally uses replaylab run because it exercises the local/CI wrapper path, not because production deployments need to start that way:

uv run replaylab run \
  --project-name dogfood-mvp \
  --auto-patch-integrations auto \
  --capture-payload-policy full \
  -- python examples/dogfood_mvp/app.py

uv run replaylab capsule list --local-store-root .replaylab

Use the child provider capsule from the list output for replay, comparison, and regression generation.

What Exists Now

Strict local capsule schemas and content-addressed payload files.
Capture-time redaction for explicit payload capture.
ASGI/FastAPI request lifecycle capture through instrument_app(...) or direct ReplayLabASGIMiddleware registration.
Worker/job lifecycle capture through capture_job(...).
Validated PydanticAI, LangChain, LangGraph, OpenAI Agents SDK, LlamaIndex, and CrewAI compatibility scenarios for supported provider calls and tool dispatch inside framework-owned workflows.
replaylab run -- <command> for wrapper capture and child provider auto-patching.
Local regression replay for full-payload OpenAI Responses, Anthropic Messages, Gemini Generate Content, requests, sync httpx, and async httpx capsules.
A localhost product app that groups internal capsules, reports, live experiment traces, generated regressions, and exports under captured-run ownership.
Capsule inspection, replay report inspection, and capsule-to-report comparison.
Viewer-first local React diagnostics plus dependency-free static HTML fallbacks.
Optional AI-assisted report explanations and instrumentation plans from secret-safe summaries.
Pytest regression generation from supported full-payload capsules.
Public-alpha candidate package metadata, typed-package markers, and wheel install smoke checks.

Current Boundaries

ReplayLab is local-first. The MVP does not upload data by default, merge child provider records back into wrapper capsules at the schema level, support multimodal streaming or streaming HTTP bodies, support file or multipart HTTP uploads, group hosted issues, or provide a hosted web UI yet.