Skip to content

OpenAI Agents SDK Compatibility

This tutorial validates ReplayLab with normal OpenAI Agents SDK function-tool dispatch. ReplayLab captures provider calls through the OpenAI Responses adapter and records execution-tool evidence through the Agents SDK dispatch path, without requiring replaylab.trace_tool(...) in the tool function.

The integration is evidence-only. It does not enforce tool policy, mock tool results, sandbox execution, or change OpenAI hosted-tool behavior.

Why This Matters

OpenAI Agents SDK applications usually register local Python tools with @function_tool or FunctionTool, then let the framework dispatch those tools after a provider tool call. ReplayLab needs to prove that the local callable actually ran without asking users to wrap every tool.

The openai_agents auto-patch label records that dispatch boundary while preserving the original return value and exception behavior.

Run The Scenario

Run:

python scripts/run_scenario.py run openai-agents-tool-local --keep-workspace

Expected ending:

ReplayLab scenario passed.
Scenario: openai-agents-tool-local
Tier: loopback
Boundaries: 3
Providers: openai, execution_tool

ReplayLab creates a clean temporary virtual environment, installs the current checkout plus openai-agents, openai, and pytest, starts a deterministic OpenAI Responses-compatible loopback provider for capture, runs a normal Agents SDK tool loop, stops the endpoint before replay, exports the React viewer, generates a pytest provider replay guard, and runs that generated test.

App Shape

The generated scenario app initializes ReplayLab once, keeps normal Agents SDK tool registration, and uses the framework runner as usual:

import replaylab
from agents import Agent, RunConfig, Runner, function_tool, set_default_openai_client
from openai import AsyncOpenAI
from replaylab import CapturePayloadPolicy

handle = replaylab.init(
    project_name="openai-agents-tool-local",
    auto_patch_integrations=("openai", "openai_agents"),
    capture_payload_policy=CapturePayloadPolicy.FULL,
)


@function_tool
def lookup_customer(customer_id: str) -> str:
    """Return deterministic customer context for the supplied customer ID."""
    return f"customer={customer_id};tier=standard"


set_default_openai_client(
    AsyncOpenAI(base_url="http://127.0.0.1:.../v1", api_key="scenario-key")
)
agent = Agent(
    name="Support triage",
    instructions="Look up the requested customer and return a terse triage label.",
    tools=[lookup_customer],
)

with handle.capture("openai_agents_tool_agent"):
    result = await Runner.run(
        starting_agent=agent,
        input="Look up customer cus_123 and classify priority.",
        run_config=RunConfig(model="gpt-5-mini", tracing_disabled=True),
    )

Provider clients should still be constructed after replaylab.init(...) so OpenAI provider replay wrappers can be installed. The tool function itself does not use ReplayLab decorators; the framework dispatch hook supplies execution-tool evidence.

What ReplayLab Captures

The scenario expects two OpenAI Responses boundaries and one execution-tool boundary:

1. provider=openai resource=openai.responses
2. provider=execution_tool resource=lookup_customer source=openai_agents_framework
3. provider=openai resource=openai.responses

The execution-tool evidence includes the tool name, callable module and qualified name, safe app-relative source path and line when available, timestamps, duration, success/failure status, and argument names only. It does not record argument values, return values, locals, source text, raw schemas, provider payload bodies, headers, environment values, or absolute paths.

Replay mode verifies the execution-tool boundary and still runs the callable normally. ReplayLab does not serve fake tool results.

What Is Not Yet Supported

  • OpenAI hosted tools as local Python execution evidence.
  • Agent-as-tool delegation as local callable evidence unless nested local function-tool evidence is observed.
  • Agents SDK streaming or non-Responses provider paths beyond the captured provider boundary.
  • Tool enforcement, tool result mocking, or framework-native semantic graph replay.

Use replaylab.trace_tool(...) as the explicit fallback for unsupported frameworks or naked provider SDK tool loops.