Skip to content

Why ReplayLab?

Agent workflows fail in ways that are hard to reproduce. A user prompt changes, a model returns a different plan, an HTTP dependency is slower than usual, or a tool call mutates outside state before anyone can inspect what happened. Traditional traces help explain that something happened, but they rarely leave behind a deterministic way to rerun the same failure.

ReplayLab is built around a smaller loop:

capture -> inspect -> replay -> compare -> generate provider replay guard -> re-run

The goal is not to replace tracing, orchestration, or evals. The goal is to turn one real run into a local artifact that a developer can inspect, replay without live provider calls, and convert into a provider replay guard.

The Problem

Agent applications depend on external boundaries:

  • LLM calls
  • HTTP APIs
  • tools
  • databases
  • queues
  • files

Those boundaries are where nondeterminism and side effects enter the system. If a workflow fails after several provider calls, a developer needs more than logs. They need the exact request/response shape, a way to rerun the app code safely, and a test artifact that keeps the bug fixed.

The ReplayLab Approach

ReplayLab captures provider boundaries into a local capsule. During replay, supported provider calls are matched against that capsule and served from stored payloads instead of hitting live services. The same application command runs in capture and replay mode, so the regression stays close to real user code.

ReplayLab is intentionally local-first:

  • capsules are written under .replaylab/
  • replay runs without the cloud
  • generated pytest provider replay guards can be checked into the application repo
  • secrets and payload contents are not printed by inspection commands

When To Use It

ReplayLab is useful when you want to answer:

  • Can I reproduce this agent failure without another live model call?
  • Did my fix preserve the provider interaction sequence?
  • Can I turn this failure into a pytest provider replay guard?
  • Can I inspect what crossed the LLM or HTTP boundary without reading raw traces?

It is not yet a hosted issue tracker or UI. The public-alpha candidate focuses on the local developer loop first.