Skip to content

LlamaIndex Compatibility

This tutorial validates ReplayLab with normal LlamaIndex FunctionTool dispatch. ReplayLab captures provider calls through the OpenAI Responses adapter and records execution-tool evidence through the LlamaIndex tool dispatch path, without requiring replaylab.trace_tool(...) in the tool function.

The integration is evidence-only. It does not enforce tool policy, mock tool results, sandbox execution, or add LlamaIndex-native graph or retrieval replay semantics.

Why This Matters

LlamaIndex applications commonly register local Python functions as FunctionTool objects and call those tools after a provider-visible tool request. ReplayLab needs to prove that the local callable actually ran without asking users to wrap every tool.

The llama_index auto-patch label records that dispatch boundary while preserving the original return value and exception behavior.

Run The Scenario

Run:

python scripts/run_scenario.py run llamaindex-tool-local --keep-workspace

Expected ending:

ReplayLab scenario passed.
Scenario: llamaindex-tool-local
Tier: loopback
Boundaries: 3
Providers: openai, execution_tool

ReplayLab creates a clean temporary virtual environment, installs the current checkout plus llama-index-core, openai, and pytest, starts a deterministic OpenAI Responses-compatible loopback provider for capture, runs a normal LlamaIndex FunctionTool tool loop, stops the endpoint before replay, exports the React viewer, generates a pytest provider replay guard, and runs that generated test.

App Shape

The generated scenario app initializes ReplayLab once, keeps normal LlamaIndex tool registration, and calls the tool through LlamaIndex:

import openai
import replaylab
from llama_index.core.tools import FunctionTool
from replaylab import CapturePayloadPolicy

handle = replaylab.init(
    project_name="llamaindex-tool-local",
    auto_patch_integrations=("openai", "llama_index"),
    capture_payload_policy=CapturePayloadPolicy.FULL,
)


def lookup_customer(customer_id: str) -> str:
    """Return deterministic customer context for the supplied customer ID."""
    return f"customer={customer_id};tier=standard"


tool = FunctionTool.from_defaults(fn=lookup_customer, name="lookup_customer")
client = openai.OpenAI(base_url="http://127.0.0.1:.../v1", api_key="scenario-key")

with handle.capture("llamaindex_tool_agent"):
    first = client.responses.create(
        model="gpt-5-mini",
        input="Look up customer cus_123 and classify priority.",
        tools=[...],
    )
    # Dispatch through LlamaIndex, not through ReplayLab wrappers.
    tool_output = tool.call(customer_id="cus_123")

Provider clients should still be constructed after replaylab.init(...) so OpenAI provider replay wrappers can be installed. Prefer module imports such as import openai when constructing clients after patching; binding from openai import OpenAI before initialization can keep an unpatched constructor reference. The tool function itself does not use ReplayLab decorators; the framework dispatch hook supplies execution-tool evidence.

What ReplayLab Captures

The scenario expects two OpenAI Responses boundaries and one execution-tool boundary:

1. provider=openai resource=openai.responses
2. provider=execution_tool resource=lookup_customer source=llama_index_framework
3. provider=openai resource=openai.responses

The execution-tool evidence includes the tool name, callable module and qualified name, safe app-relative source path and line when available, timestamps, duration, success/failure status, and argument names only. It does not record argument values, return values, query text, retrieved documents, locals, source text, raw schemas, provider payload bodies, headers, environment values, or absolute paths.

Replay mode verifies the execution-tool boundary and still runs the callable normally. ReplayLab does not serve fake tool results.

What Is Not Yet Supported

  • Query-engine or retriever tools as exact app-callable evidence when LlamaIndex does not expose the underlying local callable. ReplayLab may show limited framework dispatch evidence, but it does not overclaim exact callable control.
  • LlamaIndex-native retrieval, index, workflow, or graph semantics as replay contracts.
  • Tool enforcement, tool result mocking, or framework-native semantic graph replay.

Use replaylab.trace_tool(...) as the explicit fallback for unsupported LlamaIndex paths, unsupported frameworks, or naked provider SDK tool loops.