Microsoft Framework Tool API Evaluation
This report records the Task 130 evaluation of Microsoft-related Python agent frameworks before ReplayLab adds another automatic execution-tool tracing integration. It is not an implementation announcement: no auto-patch labels, schema values, frontend labels, or safe-workflow semantics exist yet for these frameworks.
The evaluation used current upstream documentation plus isolated uv --no-project probes so the
packages did not become ReplayLab runtime dependencies.
Decision Rubric
ReplayLab should implement the next framework hook where it can satisfy these criteria in order:
- Exact local callable execution evidence without
replaylab.trace_tool(...). - A stable dispatch hook that can fail open if upstream internals drift.
- A deterministic loopback scenario that proves provider-visible model tools and local execution evidence together.
- Strong customer relevance and likely adoption.
- Low duplicate-boundary risk with existing provider and framework hooks.
Semantic Kernel Python
Inspected package: semantic-kernel==1.36.0, import root semantic_kernel.
Tool registration is explicit and promising. Python callables are exposed with
semantic_kernel.functions.kernel_function, then registered through Kernel.add_plugin(...) or
Kernel.add_function(...). Direct invocation goes through await Kernel.invoke(...).
The best dispatch hook candidate is KernelFunction.invoke(...), with exact callable attribution
available for native functions via KernelFunctionFromMethod.method. A registered native function
also exposes safe tool identity through name, plugin_name, and metadata. A V1 integration should
patch the function-dispatch layer, not the decorator alone, so it records actual execution success
or failure.
Auto function calling is less direct. Upstream docs describe function-call content that can execute
kernel functions, but the installed Python package shape should be rechecked during implementation
because the probe found native function dispatch to be clearer than a public
FunctionCallContent.invoke(...) hook. External prompt-template functions and service-provided tools
should remain limited evidence unless an app-root Python callable is resolved.
Provider protocol path is connector-dependent. The likely first scenario should use a supported
OpenAI connector path with function-choice behavior and the existing loopback provider, then prove
that the model-visible function name matches the KernelFunction execution evidence.
Assessment: feasible, but the first implementation will need careful scenario design around the OpenAI connector and auto function-call path.
Microsoft Agent Framework Python
Inspected package: agent-framework==1.5.0, import root agent_framework.
The installed package exposes Agent, tool, and FunctionTool. Current documentation also
mentions ai_function, but the probe did not find agent_framework.ai_function in 1.5.0, so
implementation should target tool and plain functions passed to Agent(..., tools=[...]).
Exact callable attribution is available for decorated tools: agent_framework.tool(...) returns a
FunctionTool with safe identity on name, description, and the original callable on func.
The best dispatch hook candidates are FunctionTool.invoke(...) and FunctionTool.__call__(...).
Agent.run(...) accepts additional per-run tools, so registration-time capture alone would be
insufficient; dispatch-time tracing is required.
The package has a product-readiness caveat. A stable-only uv --with agent-framework resolve failed
because transitive Azure Search packages require prerelease resolution; uv --prerelease allow
--with agent-framework installed 1.5.0. That makes the framework less attractive as the immediate
next integration until ReplayLab has a deterministic install strategy for scenarios.
Provider protocol path is likely OpenAI-compatible through agent_framework.openai.OpenAIChatClient,
but the dependency graph is broad and includes hosted/external tool surfaces. Hosted tools, MCP
tools, code interpreter, file search, and web search should be limited evidence unless nested
app-local FunctionTool execution is observed.
Assessment: technically promising, but package-resolution friction makes it a weaker Task 131 target than AutoGen.
AutoGen Python
Inspected packages: autogen-core==0.7.5, autogen-agentchat==0.7.5, and autogen-ext==0.7.5.
Tool registration and dispatch are the clearest of the three. AssistantAgent(..., tools=[...])
accepts plain functions or autogen_core.tools.FunctionTool instances. FunctionTool stores the
original callable on _func, safe tool identity on name, and executes through
BaseTool.run_json(...) / FunctionTool.run_json(...) and FunctionTool.run(...).
The recommended hook is autogen_core.tools.BaseTool.run_json(...), with a narrower
FunctionTool callable resolver for exact attribution. This records one execution event at actual
dispatch time, captures only argument names from the JSON input mapping, and preserves return values
and exceptions. If a user explicitly wraps the callable with trace_tool, explicit evidence should
continue to suppress framework evidence.
Workbench and MCP tools are not local Python callable execution evidence. McpWorkbench lives under
autogen_ext.tools.mcp and needs MCP dependencies; those paths should be limited evidence unless a
nested app-local FunctionTool dispatch is observed.
Provider protocol path is usually autogen_ext.models.openai.OpenAIChatCompletionClient, which fits
ReplayLab's current OpenAI Chat Completions coverage. The deterministic scenario should use an
OpenAI-compatible loopback model client, a normal AssistantAgent, one lookup_customer tool, and
no manual ReplayLab wrappers. It should assert provider-visible tool declaration/call evidence,
AutoGen framework execution evidence, replay verification, no duplicate tool boundaries, and
provider replay guard generation.
Assessment: best Task 131 target. It has stable install behavior, a clear dispatch method, direct callable attribution, and a realistic loopback-provider scenario path.
Recommended Implementation Order
Task 131 should implement AutoGen tool-dispatch tracing first.
Task 132 should evaluate or implement Semantic Kernel once the OpenAI connector scenario is confirmed. Microsoft Agent Framework should follow only after the scenario dependency strategy is explicit, because the current package requires prerelease resolution and brings a broad tool surface.