Microsoft Framework Tool API Evaluation

This report records the Task 130 evaluation of Microsoft-related Python agent frameworks before ReplayLab adds another automatic execution-tool tracing integration. It is not an implementation announcement: no auto-patch labels, schema values, frontend labels, or safe-workflow semantics exist yet for these frameworks.

The evaluation used current upstream documentation plus isolated uv --no-project probes so the packages did not become ReplayLab runtime dependencies.

Decision Rubric

ReplayLab should implement the next framework hook where it can satisfy these criteria in order:

Exact local callable execution evidence without replaylab.trace_tool(...).
A stable dispatch hook that can fail open if upstream internals drift.
A deterministic loopback scenario that proves provider-visible model tools and local execution evidence together.
Strong customer relevance and likely adoption.
Low duplicate-boundary risk with existing provider and framework hooks.

Semantic Kernel Python

Inspected package: semantic-kernel==1.36.0, import root semantic_kernel.

Tool registration is explicit and promising. Python callables are exposed with semantic_kernel.functions.kernel_function, then registered through Kernel.add_plugin(...) or Kernel.add_function(...). Direct invocation goes through await Kernel.invoke(...).

The best dispatch hook candidate is KernelFunction.invoke(...), with exact callable attribution available for native functions via KernelFunctionFromMethod.method. A registered native function also exposes safe tool identity through name, plugin_name, and metadata. A V1 integration should patch the function-dispatch layer, not the decorator alone, so it records actual execution success or failure.

Auto function calling is less direct. Upstream docs describe function-call content that can execute kernel functions, but the installed Python package shape should be rechecked during implementation because the probe found native function dispatch to be clearer than a public FunctionCallContent.invoke(...) hook. External prompt-template functions and service-provided tools should remain limited evidence unless an app-root Python callable is resolved.

Provider protocol path is connector-dependent. The likely first scenario should use a supported OpenAI connector path with function-choice behavior and the existing loopback provider, then prove that the model-visible function name matches the KernelFunction execution evidence.

Assessment: feasible, but the first implementation will need careful scenario design around the OpenAI connector and auto function-call path.

Microsoft Agent Framework Python

Inspected package: agent-framework==1.5.0, import root agent_framework.

The installed package exposes Agent, tool, and FunctionTool. Current documentation also mentions ai_function, but the probe did not find agent_framework.ai_function in 1.5.0, so implementation should target tool and plain functions passed to Agent(..., tools=[...]).

Exact callable attribution is available for decorated tools: agent_framework.tool(...) returns a FunctionTool with safe identity on name, description, and the original callable on func. The best dispatch hook candidates are FunctionTool.invoke(...) and FunctionTool.__call__(...). Agent.run(...) accepts additional per-run tools, so registration-time capture alone would be insufficient; dispatch-time tracing is required.

The package has a product-readiness caveat. A stable-only uv --with agent-framework resolve failed because transitive Azure Search packages require prerelease resolution; uv --prerelease allow --with agent-framework installed 1.5.0. That makes the framework less attractive as the immediate next integration until ReplayLab has a deterministic install strategy for scenarios.

Provider protocol path is likely OpenAI-compatible through agent_framework.openai.OpenAIChatClient, but the dependency graph is broad and includes hosted/external tool surfaces. Hosted tools, MCP tools, code interpreter, file search, and web search should be limited evidence unless nested app-local FunctionTool execution is observed.

Assessment: technically promising, but package-resolution friction makes it a weaker Task 131 target than AutoGen.

AutoGen Python

Inspected packages: autogen-core==0.7.5, autogen-agentchat==0.7.5, and autogen-ext==0.7.5.

Tool registration and dispatch are the clearest of the three. AssistantAgent(..., tools=[...]) accepts plain functions or autogen_core.tools.FunctionTool instances. FunctionTool stores the original callable on _func, safe tool identity on name, and executes through BaseTool.run_json(...) / FunctionTool.run_json(...) and FunctionTool.run(...).

The recommended hook is autogen_core.tools.BaseTool.run_json(...), with a narrower FunctionTool callable resolver for exact attribution. This records one execution event at actual dispatch time, captures only argument names from the JSON input mapping, and preserves return values and exceptions. If a user explicitly wraps the callable with trace_tool, explicit evidence should continue to suppress framework evidence.

Workbench and MCP tools are not local Python callable execution evidence. McpWorkbench lives under autogen_ext.tools.mcp and needs MCP dependencies; those paths should be limited evidence unless a nested app-local FunctionTool dispatch is observed.

Provider protocol path is usually autogen_ext.models.openai.OpenAIChatCompletionClient, which fits ReplayLab's current OpenAI Chat Completions coverage. The deterministic scenario should use an OpenAI-compatible loopback model client, a normal AssistantAgent, one lookup_customer tool, and no manual ReplayLab wrappers. It should assert provider-visible tool declaration/call evidence, AutoGen framework execution evidence, replay verification, no duplicate tool boundaries, and provider replay guard generation.

Assessment: best Task 131 target. It has stable install behavior, a clear dispatch method, direct callable attribution, and a realistic loopback-provider scenario path.

Recommended Implementation Order

Task 131 should implement AutoGen tool-dispatch tracing first.

Task 132 should evaluate or implement Semantic Kernel once the OpenAI connector scenario is confirmed. Microsoft Agent Framework should follow only after the scenario dependency strategy is explicit, because the current package requires prerelease resolution and brings a broad tool surface.