Tutorial: Capture And Replay Anthropic Messages
This tutorial captures a real Anthropic Messages call from normal Python app startup, replays it locally, compares the replay report, and generates a pytest provider replay guard.
ReplayLab's Anthropic support is native provider-SDK support. It wraps
Anthropic().messages.create(...), AsyncAnthropic().messages.create(...), and the sync
messages.with_raw_response.create(...).parse() path. Core streaming through
messages.create(..., stream=True) and messages.stream(...) is supported when the stream is fully
consumed. Batches, files, Bedrock/Vertex clients, and OpenAI-compatible Anthropic routing are not
supported in this slice.
Setup
Install the packages and set ANTHROPIC_API_KEY.
Do not paste secret values into tutorial files, notebooks, docs, or terminal output.
python -m venv .venv
source .venv/bin/activate
pip install replaylab anthropic
export ANTHROPIC_API_KEY="..."
For local repo development before package publication:
uv sync --all-packages --all-groups
uv pip install anthropic
export ANTHROPIC_API_KEY="..."
Startup Instrumentation
Create tutorial_anthropic_app.py.
The app keeps normal Anthropic provider code.
ReplayLab setup happens near startup, before the Anthropic client is constructed.
import os
import replaylab
from anthropic import Anthropic
from replaylab import CapturePayloadPolicy
MODEL = os.environ.get("REPLAYLAB_TUTORIAL_ANTHROPIC_MODEL", "claude-sonnet-4-5")
def call_model() -> str:
client = Anthropic()
response = client.messages.create(
model=MODEL,
max_tokens=200,
system="You explain technical testing tools in one concise paragraph.",
messages=[
{
"role": "user",
"content": "Explain why deterministic replay helps agent regression tests.",
}
],
)
first_block = response.content[0]
return getattr(first_block, "text", "")
def call_model_streaming() -> str:
client = Anthropic()
with client.messages.stream(
model=MODEL,
max_tokens=200,
system="You explain technical testing tools in one concise paragraph.",
messages=[
{
"role": "user",
"content": "Explain why deterministic replay helps agent regression tests.",
}
],
) as stream:
return "".join(stream.text_stream)
def main() -> None:
handle = replaylab.init(
project_name="tutorial-anthropic",
auto_patch_integrations=("anthropic",),
capture_payload_policy=CapturePayloadPolicy.FULL,
)
with handle.capture(
"anthropic_messages_tutorial",
labels=("tutorial", "anthropic"),
runtime_metadata={"anthropic.model": MODEL},
) as capture:
print(call_model())
if capture.capsule is not None:
print(f"ReplayLab capsule: {capture.capsule.capsule_path}")
if __name__ == "__main__":
main()
Capture Scope
Capture uses your normal app command. There is no ReplayLab wrapper process in this production-style path.
uv run python tutorial_anthropic_app.py
What good looks like:
<one paragraph from Claude>
ReplayLab capsule: .replaylab/capsules/<capsule_id>
Replay
Replay is local regression tooling.
It runs the same application command under replaylab replay; when the request matches the capsule,
ReplayLab serves the recorded Anthropic response instead of calling the live provider.
uv run replaylab replay <capsule_id> \
--local-store-root .replaylab \
--auto-patch-integrations anthropic \
--report-id replay_tutorial_anthropic \
-- python tutorial_anthropic_app.py
Compare
uv run replaylab report compare \
<capsule_id> \
.replaylab/replays/replay_tutorial_anthropic/report.json \
--local-store-root .replaylab
What good looks like:
Status: succeeded
Expected boundaries: 1
Replayed: 1
Problems: 0
Generate-Test
uv run replaylab generate-test <capsule_id> \
--output tests/regression/test_tutorial_anthropic_replay.py \
--fixture-root tests/fixtures/replaylab/capsules \
--app-root . \
--auto-patch-integrations anthropic \
-- python tutorial_anthropic_app.py
Run the generated test:
uv run pytest tests/regression/test_tutorial_anthropic_replay.py
The generated test uses replaylab replay, asserts the replay report, and avoids a live Anthropic
call.
Local App
Start the app after capture or replay:
uv run replaylab app --local-store-root .replaylab
Open the captured run. The trace should show one Anthropic LLM call, a provider chip labeled
anthropic, formatted input/output previews, a streaming chip when the call used a stream,
attached regression replay evidence after replay, and generated guard evidence after generation.
Maintainer Loopback Scenario
Maintainers can validate the same loop without an Anthropic API key:
python scripts/run_scenario.py run anthropic-local --keep-workspace
python scripts/run_scenario.py run anthropic-streaming-local --keep-workspace
That scenario uses the real anthropic SDK against a local fake Anthropic Messages server, then
captures, replays, compares, exports, generates a guard, runs pytest, and checks the local app trace
shape. The streaming variant fully consumes a deterministic Messages stream and replays the recorded
event sequence.