When Microsoft quietly merged the teams behind Semantic Kernel and AutoGen into a single project, the result was not a simple rebrand. The Microsoft Agent Framework represents a ground-up rethink of how developers build, orchestrate, and deploy AI agents across Python and .NET. It shipped its 1.0.0 release in April 2026 with 9,000+ GitHub stars, 121 contributors, and 24 Python packages spanning everything from core abstractions to Anthropic and Bedrock integrations. This article dives deep into the design choices that shaped the framework, lays out a practical demo plan you can follow in Python, and flags every high-risk dependency and lock-in vector I found during a thorough code review.
microsoft
Organization
agent-framework
A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.The Lineage: From Semantic Kernel and AutoGen to One Framework
The story begins with two Microsoft open-source projects that took different philosophical approaches to AI agents. Semantic Kernel offered enterprise-grade plumbing -- session management, type safety, dependency injection, telemetry -- but made simple tasks verbose. AutoGen pioneered accessible multi-agent conversation patterns but lacked the middleware and observability hooks that production systems demand. Agent Framework is the declared successor to both, built by the same core engineers (Eduard van Valkenburg, Mark Wallace, Roger Barreto, Stephen Toub, and Sergey Menshykh among them). Microsoft provides official migration guides from both predecessors, signaling that Semantic Kernel and AutoGen are now in maintenance mode. The framework's MS Learn documentation states it directly: "Agent Framework is the next generation of both Semantic Kernel and AutoGen."
The Architecture in Three Layers
The framework organizes around three distinct layers, each independently usable. At the bottom sit Chat Clients -- thin wrappers over LLM provider APIs that implement a SupportsChatGetResponse protocol. You can use an OpenAIChatClient or FoundryChatClient directly without ever touching the Agent abstraction. The middle layer is the Agent, which composes a chat client with instructions, tools, middleware, and session management. On top sits the Workflow layer -- a graph-based execution engine where agents, Python functions, and deterministic logic connect via typed data-flow edges with checkpointing and time-travel support.
This layering is a deliberate design choice documented in ADR-0001. The team studied AutoGen, OpenAI Agent SDK, Google ADK, AWS Strands, LangGraph, Agno, and the A2A protocol before settling on a response model that separates "Primary" output (the answer) from "Secondary" output (tool calls, reasoning traces, handoff signals). Non-streaming calls return a clean AgentResponse with a .text property; streaming returns an async iterable of AgentResponseUpdate deltas. This avoids the common pitfall where callers must filter housekeeping events out of the response stream just to show the user an answer.
Twelve Design Decisions That Define the Framework
The docs/decisions directory contains 23 architectural decision records (ADRs), each comparing the framework's approach against LangChain, CrewAI, LlamaIndex, and other ecosystems. Here are the twelve most consequential choices:
1. Provider-Leading Client Naming (ADR-0021). The OpenAIChatClient name refers to the provider, not the API surface. The Responses API (not Chat Completions) is the default, reflecting OpenAI's own recommendation. The older Chat Completions API is accessible as OpenAIChatCompletionClient. This means the "obvious" class name gives you the modern API without reading documentation.
2. Middleware Over Callbacks (ADR-0007). After analyzing 14 competing frameworks, the team chose an ASP.NET-style middleware pipeline over observer callbacks. Three distinct middleware types -- AgentMiddleware, ChatMiddleware, and FunctionMiddleware -- intercept execution at the agent, LLM-call, and tool-invocation levels respectively. Each receives a mutable context and calls next(), enabling pre/post processing, short-circuiting, and exception handling. This is more powerful than LangChain's read-only BaseCallbackHandler because middleware can modify requests and responses flowing through the pipeline.
3. OpenTelemetry via Wrapper Pattern (ADR-0003). Rather than embedding telemetry into the base Agent class (which would violate single-responsibility), the framework provides an OpenTelemetryAgent wrapper applied with .with_open_telemetry(). It captures spans for every agent.run() and agent.run_streaming() call, recording token usage, response times, and error codes -- without a single line of tracing code touching the core Agent implementation.
4. User Approvals as Content Types (ADR-0006). Human-in-the-loop approval is modeled as ApprovalRequestContent and ApprovalResponseContent -- first-class content types that flow through the same message pipeline as text and images. When an agent needs approval, it returns an ApprovalRequestContent, ending the current run. The caller obtains approval however it wants (UI dialog, Slack bot, email) and resumes with an ApprovalResponseContent on the same thread. This design works for both co-located and remote agent hosting, unlike callback-based approaches that cannot survive process suspension.
5. Feature Collections for Extensibility (ADR-0014). Rather than adding every new capability (structured output, custom history stores) as a strongly-typed property on AgentRunOptions, the framework uses a loosely-typed AgentFeatureCollection. Components that support a feature check for it; those that don't simply ignore it. This means new cross-cutting concerns can be added without breaking the base abstraction's API surface.
6. AsyncLocal Run Context (ADR-0015). During an agent run, middleware, tools, and nested agents need access to the current session, request messages, and run options. The framework combines explicit parameter passing (for testability) with AsyncLocal ambient context (for deep call stacks). This means a tool function can access Agent.current_run_context.session without the session being threaded through every function signature.
7. Context Compaction Strategies (ADR-0019). Long-running agents that make dozens of tool calls accumulate unbounded message history. The framework introduces composable CompactionStrategy objects that can summarize older messages, truncate to token budgets, or remove redundant tool-call/result pairs. Critically, these operate within the tool loop -- not just at run boundaries -- because the team identified that middleware-based approaches only affect individual LLM calls while the underlying message list keeps growing.
8. Subpackage-Per-Feature with Lazy Loading (ADR-0008). The Python packaging uses a vendor-based namespace approach (agent_framework.openai, agent_framework.anthropic) with lazy imports that raise helpful error messages when a subpackage is not installed. This means pip install agent-framework-core gives you zero provider dependencies; you add only what you use.
9. Chat History Persistence Consistency (ADR-0022). The framework identified that service-managed history (OpenAI Responses with store=true) and local ChatHistoryProvider storage can diverge during multi-step tool calling. The solution is opt-in per-service-call persistence via RequirePerServiceCallChatHistoryPersistence, so developers can choose between atomic per-run writes (safe default) and service-matching per-call writes (for crash recovery).
10. AG-UI Protocol Support (ADR-0010). The AG-UI (Agent-User Interaction) protocol enables streaming communication between agents and frontend applications. Agent Framework implements it with internal event types converted to framework-native abstractions at boundaries, protecting consumers from protocol changes. This makes .NET and Python agents accessible from any AG-UI-compatible client (LangGraph UIs, CrewAI frontends, etc.).
11. Skills as Progressive Disclosure (ADR-0021). The Agent Skills system presents skills to the model as three tools: load_skill(), read_skill_resource(), and run_skill_script(). The model decides when to load which skill, avoiding the upfront cost of injecting all skill content into every prompt. Skills can come from filesystem SKILL.md files, inline Python code, or class-based libraries -- all unified behind abstract base types.
12. Long-Running Operations (ADR-0009). For tasks that take minutes or hours (code generation, deep research), the framework supports starting an operation, polling for status, and retrieving results -- modeled consistently across OpenAI Responses, Azure AI Foundry Agents, and the A2A protocol.
Key Features at a Glance
- Multi-Provider Support: OpenAI, Azure OpenAI, Anthropic, Claude, Bedrock, Ollama, Foundry Local, GitHub Copilot, and Copilot Studio -- each as an independent pip-installable package
- Graph-Based Workflows: Connect agents and deterministic functions via typed edges with streaming, checkpointing, human-in-the-loop, and time-travel (replay from any checkpoint)
- Five Orchestration Patterns: Sequential, Concurrent, Group Chat, Handoff, and Magentic orchestration out of the box
- Hosting Support: A2A protocol, Azure Functions, Durable Task (for durable agents that survive process restarts), and AG-UI for frontend streaming
- DevUI: Interactive browser-based developer UI for testing agents and debugging workflows visually
- Evaluation: Built-in
LocalEvaluator,evaluate_agent(), andevaluate_workflow()plus Foundry Evals integration (experimental)
Under the Hood: Python Package Architecture
The Python side ships as a uv workspace with 24 packages. The meta-package agent-framework depends solely on agent-framework-core[all]==1.0.0, which in turn pulls every released subpackage. Three packages have reached released status: core, openai, and foundry. The remaining 21 are in beta. Code quality is enforced with Ruff (120-character lines, Bandit security checks, isort, bugbear), Pyright for type checking, and pytest with async support.
The package design document defines three tiers. Tier 0 (importable from agent_framework) covers agents, tools, types, middleware, sessions, and workflows. Tier 1 (agent_framework.<component>) covers vector data, text search, exceptions, evaluation, and observability. Tier 2 (agent_framework.<vendor>) covers provider integrations. All vendor namespaces use lazy loading with informative error messages when dependencies are missing.
# The three tiers in practice:
from agent_framework import Agent, tool, Message # Tier 0 - core
from agent_framework import WorkflowBuilder, Executor # Tier 0 - workflows
from agent_framework.openai import OpenAIChatClient # Tier 2 - provider
from agent_framework.anthropic import AnthropicChatClient # Tier 2 - provider (beta)
A Feasible Demo Plan in Python
Here is a five-stage demo plan that progressively showcases the framework's capabilities, from a minimal "hello agent" to a production-ready workflow with observability. Each stage builds on the previous one, and I have verified that the required sample code exists in the repository.
Stage 1: Hello Agent (5 minutes). Install the framework, create an agent with OpenAIChatClient, and run a single prompt. Demonstrates the three-line minimum: create client, create agent, call agent.run(). Then show streaming with agent.run("...", stream=True). Source: 01_hello_agent.py.
import asyncio
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient
async def main():
agent = Agent(
client=OpenAIChatClient(),
name="HelloAgent",
instructions="You are a friendly assistant. Keep answers brief.",
)
# Non-streaming
print(await agent.run("What is the capital of France?"))
# Streaming
async for chunk in agent.run("Tell me a fun fact.", stream=True):
if chunk.text:
print(chunk.text, end="", flush=True)
asyncio.run(main())
Stage 2: Tools and Function Calling (10 minutes). Add a @tool-decorated Python function and show how the agent autonomously decides to call it. Highlight the approval_mode parameter -- set to "always_require" for production, "never_require" for demos. Show how tool definitions auto-generate JSON schemas from type annotations and docstrings. Source: 02_add_tools.py.
Stage 3: Multi-Agent Orchestration (15 minutes). Create a Writer agent and a Reviewer agent. Demonstrate a sequential orchestration where the writer drafts, the reviewer critiques, and the writer refines. Then upgrade to a formal SequentialOrchestrator from the orchestrations package. Source: orchestrations samples.
Stage 4: Graph-Based Workflows (15 minutes). Build a workflow using WorkflowBuilder with class-based and function-based executors connected by edges. Show how ctx.send_message() passes data between nodes and ctx.yield_output() produces final results. Demonstrate checkpointing, and then replay from a checkpoint to show "time-travel" debugging. Source: 05_first_workflow.py and checkpoint samples.
Stage 5: Middleware, Observability, and DevUI (15 minutes). Wrap the agent with .with_open_telemetry() and show traces in Jaeger or the console exporter. Add a custom ChatMiddleware that logs every LLM call. Finally, launch DevUI for an interactive visual interface for testing and debugging. Source: observability samples and DevUI package.
The Five Orchestration Patterns
The agent-framework-orchestrations package is where the framework moves from single-agent scripting to genuine multi-agent systems. Every pattern is exposed through its own *Builder class, and every builder produces a Workflow that can itself be wrapped as an Agent -- meaning any pattern can be composed inside any other. All five builders are importable from the same namespace: from agent_framework.orchestrations import SequentialBuilder, ConcurrentBuilder, HandoffBuilder, GroupChatBuilder, MagenticBuilder. Understanding when and why to pick each one is the central skill in working with this framework.
Sequential: The Assembly Line. SequentialBuilder chains participants one after another, threading a shared conversation list through each step. Agent A receives the task, appends its response, and passes the enriched context to Agent B. No agent runs until the previous one has completed. This makes it the right choice when output quality depends on a strict ordering of expertise: a researcher gathers facts, a writer transforms them into prose, a reviewer critiques the draft. The pattern maps directly to pipeline thinking and is the easiest to reason about and debug. Internally the workflow injects small normaliser and converter nodes between participants -- these appear as ExecutorInvoke events in the stream but can be safely ignored. Source: sequential_agents.py.
from agent_framework.orchestrations import SequentialBuilder
# writer drafts, reviewer critiques -- in strict order
workflow = SequentialBuilder(participants=[writer, reviewer]).build()
async for event in workflow.run_stream("Write a tagline for a cloud IDE."):
if hasattr(event, "text") and event.text:
print(event.text, end="", flush=True)
Concurrent: The Parallel Council. ConcurrentBuilder implements a fan-out / fan-in pattern: the same prompt is dispatched to every participant simultaneously, and their responses are aggregated once all have finished. The default aggregator simply concatenates the resulting list[Message] objects, but it can be replaced with a callback -- for example an LLM summariser that synthesises the parallel answers into one coherent answer. Concurrent orchestration is ideal whenever the sub-tasks are independent: gather market research, legal risk analysis, and engineering feasibility in parallel rather than serially. The pattern is meaningless if participants share state or if each agent needs the previous agent's output. A dispatcher and aggregator node are always present; they are lightweight but do count against context window if all responses are large. Source: concurrent_agents.py.
from agent_framework.orchestrations import ConcurrentBuilder
# researcher, marketer, and engineer all see the same prompt at the same time
workflow = ConcurrentBuilder(participants=[researcher, marketer, engineer]).build()
result = await workflow.run("What are the opportunities in autonomous logistics?")
for msg in result:
print(msg.role, msg.content)
Group Chat: The Round Table. GroupChatBuilder models a shared conversation where a selector decides which agent speaks next each round. The selector can be a pure Python function (deterministic round-robin, priority queue, topic routing) or an LLM-backed agent that reads the conversation history and names the most qualified next speaker. This is the pattern for long-form deliberation -- philosophical debates, design reviews, code walkthroughs -- where the conversation evolves dynamically and a fixed order would feel artificial. The termination condition controls when the loop stops: a maximum round count, a keyword in the last message, or a custom predicate on the GroupChatState object. An important subtlety is that the conversation history is shared across all participants; every agent sees every other agent's messages, which can inflate token usage in long sessions. Source: group_chat_simple_selector.py and group_chat_agent_manager.py.
from agent_framework.orchestrations import GroupChatBuilder, GroupChatState
def round_robin(state: GroupChatState) -> str:
names = list(state.participants.keys())
return names[state.current_round % len(names)]
workflow = (
GroupChatBuilder(participants=[python_expert, answer_verifier])
.with_selection_func(round_robin)
.with_max_rounds(6)
.build()
)
Handoff: The Mesh Router. HandoffBuilder implements a decentralised routing topology. There is no central conductor. Instead, each agent is given one tool per possible handoff target, and the agent itself decides -- based on its instructions and the conversation -- whether to answer directly or invoke a handoff tool to transfer control. A triage agent processes the opening statement and routes to a billing, refund, or support specialist; each specialist can in turn hand off to another agent or return control to the user. The pattern supports an autonomous mode (.with_autonomous_mode()) in which specialists iterate independently until they invoke a handoff tool, and a simple mode in which control returns to the user after each specialist response. HandoffBuilder auto-registers the handoff tools so there is no manual wiring. The risk here is cycle detection: without explicit termination conditions, two agents that each believe the other is more qualified can ping-pong indefinitely. Source: handoff_simple.py and handoff_autonomous.py.
from agent_framework.orchestrations import HandoffBuilder
workflow = (
HandoffBuilder()
.participants([triage, billing_agent, support_agent])
.with_start_agent(triage)
.build()
)
async for event in workflow.run_stream(HandoffAgentUserRequest(messages=[...])):
print(event)
Magentic: The Strategic Director. MagenticBuilder is the most sophisticated of the five. It is a Python implementation of the Magentic-One pattern published by Microsoft Research in late 2024. Rather than a fixed graph or simple selection function, Magentic runs a continuous plan-act-observe-update loop governed by a progress ledger. At each step the manager agent reads the full task description, the facts gathered so far, and the team's capabilities, then decides which agent to invoke and with what sub-instruction. After each agent completes its step the manager updates the ledger -- marking facts as verified or stalled -- and plans the next move. The loop continues until the ledger shows all facts resolved or the manager declares the task complete. This pattern shines on complex, open-ended tasks where the solution path is not known in advance: research synthesis, multi-step code generation, competitive analysis. It is also the most expensive in tokens and the most sensitive to the quality of the manager agent's model. The MagenticProgressLedger and MagenticContext data classes are exposed publicly so you can inspect and checkpoint the reasoning state mid-run. Human-in-the-loop variants (magentic_human_plan_review.py) let a human edit the plan before execution begins, a safeguard worth enabling in any production deployment. Source: magentic.py.
from agent_framework.orchestrations import MagenticBuilder
workflow = (
MagenticBuilder(
participants=[researcher_agent, coder_agent],
manager_agent=manager_agent,
)
.with_max_turns(20)
.build()
)
async for event in workflow.run_stream("Estimate CO2 emissions for training GPT-4."):
if isinstance(event, MagenticProgressLedger):
print("Ledger update:", event)
Choosing the right pattern is a first-class architectural decision. Sequential is predictable but blocks on every step. Concurrent is fast but produces independent outputs that may contradict each other. Group Chat is flexible but token-heavy and hard to terminate cleanly. Handoff is elegant for triage workflows but requires careful cycle prevention. Magentic is the most capable but the most expensive and opaque. The good news is that all five are composable: a MagenticBuilder workflow can include a ConcurrentBuilder sub-workflow as one of its participants, enabling patterns like parallel research followed by strategic synthesis.
Risk Assessment: Dependencies, Lock-In, and High-Risk Issues
No framework analysis is complete without an honest look at the risks. After reading every ADR, the package status file, the pyproject.toml dependency tree, and the repository's 581 open issues, here are the risks I would flag before adopting this framework in production.
| Risk | Severity | Details |
|---|---|---|
| Azure Foundry Gravity | HIGH | The "happy path" in all documentation and samples uses FoundryChatClient with Azure AI Foundry endpoints. The FoundryAgent class is a first-class citizen with its own subclass, while other providers are reached through the generic Agent(client=X) composition pattern. This creates an implicit tilt toward Azure even though the framework technically supports any provider. Teams that start with Foundry-specific features (hosted agents, evals integration, service-managed history) will find switching providers requires significant refactoring. |
| Beta Package Maturity | HIGH | 21 of 24 Python packages are in beta. Only core, openai, and foundry have reached GA. The beta packages -- including Anthropic, Bedrock, A2A, orchestrations, and DurableTask -- may have breaking changes between versions. Production systems using non-OpenAI providers or advanced hosting patterns are building on shifting ground. |
| Semantic Kernel / AutoGen Deprecation | MED | Microsoft positions Agent Framework as the successor to both Semantic Kernel and AutoGen, with explicit migration guides. Organizations invested in either predecessor face a migration. The framework's API surface differs substantially -- tools use @tool instead of SK's kernel plugins, workflows replace AutoGen's conversation patterns. The migration is not a drop-in replacement. |
| Rapid API Churn | MED | The ADR naming history reveals active churn: OpenAIResponsesClient was renamed to OpenAIChatClient, AzureAIClient was deprecated in favor of FoundryChatClient, and model_id / deployment_name were unified to model. The framework is at 1.0.0 but the deprecation list is already long. Import paths are stable by design (lazy-loading gateways), but class names may continue shifting. |
| Context Compaction is New | MED | ADR-0019 acknowledges that the current architecture has "no way to compact messages during the tool loop." The compaction strategy design is accepted but still being implemented. Long-running agents that make many tool calls will hit context window limits until this is fully operational. |
| Experimental Features in Core | MED | Two feature sets -- EVALS (evaluation APIs) and SKILLS (agent skills) -- are decorated as experimental even within the released core package. Using evaluate_agent() or SkillsProvider will emit deprecation-style warnings. These APIs may change without following semver. |
| LiteLLM Supply Chain Pin | MED | The root pyproject.toml includes a constraint: litellm<1.82.7, pinned to avoid "compromised 1.82.7/1.82.8 releases." This is a direct reference to a supply-chain security incident in a transitive dependency. While the pin mitigates the specific issue, it signals that the dependency tree includes packages with a history of compromise. |
| Third-Party Data Flow | LOW | The README includes a warning: "If you use Microsoft Agent Framework to build applications that operate with third-party servers or agents, you do so at your own risk." MCP server connections, A2A protocol interactions, and multi-provider setups all route data through external services. There is no built-in data-loss-prevention middleware. |
| Open Issue Volume | LOW | 581 open issues and 166 open PRs as of April 2026. The volume is expected for a framework this new with this many contributors (121), but indicates active development and potential instability in edge cases. |
Lock-In Vectors to Watch
Azure AI Foundry as the default identity layer. The quickstart uses AzureCliCredential() from azure-identity. The FoundryChatClient is the only client that supports service-managed conversation history, hosted agents, and Foundry Evals out of the box. If you use these features, migrating to a non-Azure backend requires replacing both the client and the infrastructure it provides.
Durable Task framework for durable agents. The durabletask package enables agents that survive process restarts by persisting execution state. This is built on the Microsoft Durable Task Framework, which is itself closely tied to Azure Durable Functions. Running durable agents outside Azure requires self-hosting the Durable Task sidecar.
Azure Functions hosting. The azurefunctions package provides first-class integration for hosting agents as Azure Function triggers. There is no equivalent AWS Lambda or Google Cloud Functions package. Teams deploying to non-Azure clouds would need to build their own hosting integration.
Microsoft Purview middleware. The purview package adds content safety and compliance middleware using Azure Purview. There is no generic content-safety middleware abstraction -- it is Azure-specific.
Why I Like It
Despite the Azure gravity, three things stand out positively. First, the middleware architecture is genuinely well-designed. Having three distinct interception points (agent, chat, function) with mutable contexts is more powerful than any competing framework I have reviewed yet.
Second, the ADR process is exemplary -- every major decision includes a comparison table against LangChain, AutoGen, CrewAI, Google ADK, and AWS Strands. This is engineering transparency at its best.
Third, the workflow engine with checkpointing and time-travel is production-grade infrastructure that most agent frameworks do not attempt. Being able to replay an agent workflow from any saved checkpoint is a meaningful capability for debugging non-deterministic AI behavior.
Community and Contribution
The project is maintained by Microsoft employees and has grown to 121 contributors across 69 releases.
A public Discord server hosts weekly office hours.
The CONTRIBUTING.md is straightforward: file issues, use pre-commit hooks for Python, include tests for new features. The code is organized so that Python and .NET implementations are side-by-side in the same monorepo, which helps maintain API parity across languages. Pull requests are actively reviewed -- 166 open at the time of writing -- and Copilot-assisted code review is visibly in use.
Usage and License Terms
The framework is released under the MIT License, which permits commercial use, modification, distribution, and private use with no restrictions beyond preserving the copyright notice. This is the most permissive common open-source license. There are no CLA requirements beyond the standard Microsoft Contributor License Agreement for pull requests.
Impact Potential
Agent Framework has the potential to become the default choice for enterprise AI agent development, particularly in Azure-centric organizations. Its combination of a clean agent abstraction, production middleware, graph workflows with checkpointing, and first-class multi-provider support puts it in a unique position.
The risk is that the Azure-first documentation and beta package maturity create a self-fulfilling prophecy where only Azure users adopt it, limiting the framework's reach. If Microsoft can stabilize the non-Azure provider packages and invest equally in documentation for OpenAI-direct and Anthropic-direct users, the framework could genuinely become a cross-cloud standard. The open-source MIT license removes the legal barrier; the question is whether the engineering investment follows.
Conclusion
Microsoft Agent Framework is the most architecturally rigorous agent framework I have reviewed. Its ADR-driven design process, three-layer middleware system, and graph-based workflow engine with time-travel set it apart from LangChain, CrewAI, and the OpenAI Agent SDK. The Python demo plan above can be executed in under an hour with an OpenAI API key -- no Azure account required for the first four stages. The primary risks are the framework's gravitational pull toward Azure Foundry, the immaturity of non-core packages, and the ongoing API churn inherent in a 1.0.0 release with 23 accepted ADRs. For teams already on Azure, this is likely the right choice. For multi-cloud or cloud-agnostic teams, adopt the core and OpenAI packages (which are GA) but treat provider-specific packages and hosting integrations as experimental until they reach release status.
Explore the repository at github.com/microsoft/agent-framework, try the getting started samples, and join the Discord community to share your experience. If you are migrating from Semantic Kernel or AutoGen, start with the migration guides.

Microsoft Agent Framework: Design Decisions, Demo Plan and Risk Assessment