Skip to content

Tutorial: Capture And Replay Anthropic Messages

This tutorial captures a real Anthropic Messages call from normal Python app startup, replays it locally, compares the replay report, and generates a pytest provider replay guard.

ReplayLab's Anthropic support is native provider-SDK support. It wraps Anthropic().messages.create(...), AsyncAnthropic().messages.create(...), and the sync messages.with_raw_response.create(...).parse() path. Core streaming through messages.create(..., stream=True) and messages.stream(...) is supported when the stream is fully consumed. Batches, files, Bedrock/Vertex clients, and OpenAI-compatible Anthropic routing are not supported in this slice.

Setup

Install the packages and set ANTHROPIC_API_KEY. Do not paste secret values into tutorial files, notebooks, docs, or terminal output.

python -m venv .venv
source .venv/bin/activate
pip install replaylab anthropic
export ANTHROPIC_API_KEY="..."

For local repo development before package publication:

uv sync --all-packages --all-groups
uv pip install anthropic
export ANTHROPIC_API_KEY="..."

Startup Instrumentation

Create tutorial_anthropic_app.py. The app keeps normal Anthropic provider code. ReplayLab setup happens near startup, before the Anthropic client is constructed.

import os

import replaylab
from anthropic import Anthropic
from replaylab import CapturePayloadPolicy

MODEL = os.environ.get("REPLAYLAB_TUTORIAL_ANTHROPIC_MODEL", "claude-sonnet-4-5")


def call_model() -> str:
    client = Anthropic()
    response = client.messages.create(
        model=MODEL,
        max_tokens=200,
        system="You explain technical testing tools in one concise paragraph.",
        messages=[
            {
                "role": "user",
                "content": "Explain why deterministic replay helps agent regression tests.",
            }
        ],
    )
    first_block = response.content[0]
    return getattr(first_block, "text", "")


def call_model_streaming() -> str:
    client = Anthropic()
    with client.messages.stream(
        model=MODEL,
        max_tokens=200,
        system="You explain technical testing tools in one concise paragraph.",
        messages=[
            {
                "role": "user",
                "content": "Explain why deterministic replay helps agent regression tests.",
            }
        ],
    ) as stream:
        return "".join(stream.text_stream)


def main() -> None:
    handle = replaylab.init(
        project_name="tutorial-anthropic",
        auto_patch_integrations=("anthropic",),
        capture_payload_policy=CapturePayloadPolicy.FULL,
    )

    with handle.capture(
        "anthropic_messages_tutorial",
        labels=("tutorial", "anthropic"),
        runtime_metadata={"anthropic.model": MODEL},
    ) as capture:
        print(call_model())

    if capture.capsule is not None:
        print(f"ReplayLab capsule: {capture.capsule.capsule_path}")


if __name__ == "__main__":
    main()

Capture Scope

Capture uses your normal app command. There is no ReplayLab wrapper process in this production-style path.

uv run python tutorial_anthropic_app.py

What good looks like:

<one paragraph from Claude>
ReplayLab capsule: .replaylab/capsules/<capsule_id>

Replay

Replay is local regression tooling. It runs the same application command under replaylab replay; when the request matches the capsule, ReplayLab serves the recorded Anthropic response instead of calling the live provider.

uv run replaylab replay <capsule_id> \
  --local-store-root .replaylab \
  --auto-patch-integrations anthropic \
  --report-id replay_tutorial_anthropic \
  -- python tutorial_anthropic_app.py

Compare

uv run replaylab report compare \
  <capsule_id> \
  .replaylab/replays/replay_tutorial_anthropic/report.json \
  --local-store-root .replaylab

What good looks like:

Status: succeeded
Expected boundaries: 1
Replayed: 1
Problems: 0

Generate-Test

uv run replaylab generate-test <capsule_id> \
  --output tests/regression/test_tutorial_anthropic_replay.py \
  --fixture-root tests/fixtures/replaylab/capsules \
  --app-root . \
  --auto-patch-integrations anthropic \
  -- python tutorial_anthropic_app.py

Run the generated test:

uv run pytest tests/regression/test_tutorial_anthropic_replay.py

The generated test uses replaylab replay, asserts the replay report, and avoids a live Anthropic call.

Local App

Start the app after capture or replay:

uv run replaylab app --local-store-root .replaylab

Open the captured run. The trace should show one Anthropic LLM call, a provider chip labeled anthropic, formatted input/output previews, a streaming chip when the call used a stream, attached regression replay evidence after replay, and generated guard evidence after generation.

Maintainer Loopback Scenario

Maintainers can validate the same loop without an Anthropic API key:

python scripts/run_scenario.py run anthropic-local --keep-workspace
python scripts/run_scenario.py run anthropic-streaming-local --keep-workspace

That scenario uses the real anthropic SDK against a local fake Anthropic Messages server, then captures, replays, compares, exports, generates a guard, runs pytest, and checks the local app trace shape. The streaming variant fully consumes a deterministic Messages stream and replays the recorded event sequence.