Python | The .NET Blog

CodeAct in Agent Framework: How to Cut Your Agent's Latency in Half

Emiliano Montesdeoca — Sat, 25 Apr 2026 00:00:00 +0000

There’s a moment in every agent project where you look at the trace and think: “why is this taking so long?” The model is fine. The tools work. But there are seven round trips to get a result you could compute in one shot.

That’s exactly the problem CodeAct solves — and the Agent Framework team just shipped alpha support for it via a new agent-framework-hyperlight package.

What is CodeAct?

The CodeAct pattern is elegantly simple: instead of giving the model a list of tools and letting it call them one by one, you give it a single execute_code tool and let it express the entire plan as a short Python program. The agent writes the code once, the sandbox runs it, and you get back a single consolidated result.

A five-step plan that used to be five model turns becomes one execute_code turn containing a Python script that calls your tools via call_tool(...).

The benchmark in the repo makes this concrete. Eight users, dozens of orders, five tools (list users, get orders, discount rate, tax rate, compute line total). Same model, same tools, same prompt — just different wiring:

Wiring	Time	Tokens
Traditional	27.81s	6,890
CodeAct	13.23s	2,489
Improvement	52.4%	63.9%

That’s not a micro-benchmark. That’s a realistic workload with real orchestration overhead.

The safety piece: Hyperlight micro-VMs

Here’s the thing that made me actually excited about this: safety has historically been CodeAct’s Achilles heel. If you’re running model-generated code, where exactly is it running? Against your process? In a shared container?

The agent-framework-hyperlight package solves this with Hyperlight micro-VMs. Every single execute_code call gets its own freshly created micro-VM — with its own memory, no host filesystem access beyond what you explicitly mount, and no network access beyond the domains you allow. Startup is measured in milliseconds. The isolation is basically free.

Your tools still run on the host (they’re your code, with your access). The model-generated glue — the Python that decides which tools to call and in what order — runs sandboxed. That’s the right split.

Wiring it up

The minimal setup is straightforward:

from agent_framework import Agent, tool
from agent_framework_hyperlight import HyperlightCodeActProvider

@tool
def get_weather(city: str) -> dict[str, float | str]:
 """Return the current weather for a city."""
 return {"city": city, "temperature_c": 21.5, "conditions": "partly cloudy"}

codeact = HyperlightCodeActProvider(
 tools=[get_weather],
 approval_mode="never_require",
)

agent = Agent(
 client=client,
 name="CodeActAgent",
 instructions="You are a helpful assistant.",
 context_providers=[codeact],
)

result = await agent.run(
 "Get the weather for Seattle and Amsterdam and compare them."
)

The provider registers execute_code on every run and injects the CodeAct instructions into the system prompt automatically. You don’t need to write a custom prompt fragment.

Mixing CodeAct with approval-gated tools

This is where it gets interesting. Not every tool should run inside the sandbox without approval. You might want to gate send_email or charge_credit_card individually. The framework handles this cleanly:

@tool(approval_mode="always_require")
def send_email(to: str, subject: str, body: str) -> str:
 """Send an email. Requires approval on every call."""
 ...

agent = Agent(
 client=client,
 name="MixedToolsAgent",
 instructions="You are a helpful assistant.",
 context_providers=[codeact],
 tools=[send_email], # invoked directly, approval-gated
)

Tools on the provider → the model reaches them via call_tool(...) inside the sandbox, cheap and chainable.
Tools on the agent directly → the model calls them as first-class tool calls, approval applies individually.

That’s a clean split: chainable data-lookup tools go through CodeAct, side-effect tools stay on the agent.

When to use CodeAct (and when not to)

Reach for CodeAct when:

The task chains many small tool calls (lookups, joins, computations, formatting)
You care about latency and token cost
You want strong per-call isolation on model-generated code by default
Tools are cheap and safe to invoke in sequence

Stick with traditional tool-calling when:

The agent only makes one or two tool calls per turn
Each tool has side effects you want approved individually
Tool descriptions are sparse or ambiguous — CodeAct relies on good docstrings

That last point matters. Because the model writes Python that calls your tools by name, docstrings and parameter annotations become part of the contract the model reasons about. Weak descriptions hurt CodeAct more than traditional tool-calling.

Try it now

pip install agent-framework-hyperlight --pre
# or
uv add --prerelease=allow agent-framework-hyperlight

Samples are under python/packages/hyperlight/samples/. The benchmark sample is the best place to start — run it against your own tools to see if the wins apply to your workload.

Worth noting: Linux and Windows are supported today. macOS support is on the way. A .NET counterpart is also coming, so if you’re on C#, keep an eye on the repo.

Wrapping up

CodeAct isn’t magic — it’s a sensible pattern that was just too risky to use without proper sandboxing. Hyperlight changes that equation. Per-call micro-VM isolation, millisecond startup, 50%+ latency improvement on the right workloads. That’s a combination worth experimenting with.

Check the full post on the Agent Framework blog for deeper coverage on filesystem mounts, network policy, and the standalone HyperlightExecuteCodeTool wiring.

Where Does Your Agent Remember Things? A Practical Guide to Chat History Storage

Emiliano Montesdeoca — Sat, 25 Apr 2026 00:00:00 +0000

When you build an AI agent, you spend most of your energy on the model, the tools, and the prompts. The question of where the conversation history lives feels like an implementation detail — but it’s actually one of the most important architectural decisions you’ll make.

It determines whether users can branch conversations, undo responses, resume sessions after a restart, and whether your data ever leaves your infrastructure. The Agent Framework team published a deep dive on this and it’s worth understanding the full landscape.

Two fundamental patterns

Service-managed: the AI service stores the conversation state. Your app holds a reference (a thread ID, a response ID) and the service automatically includes relevant history on each request. Simpler to set up. Less control.

Client-managed: your app maintains the full history and sends relevant messages with every request. The service is stateless. You control everything — what gets sent, how it’s compressed, where it lives.

Neither is universally better. The right choice depends on what you’re building.

Service-managed: linear vs forking

Not all service-managed storage is the same. There are two distinct models:

Linear (single-threaded): messages form an ordered sequence. You can append, but you can’t branch. This is the traditional chat model — used by Foundry Prompt Agents and the now-deprecated OpenAI Assistants API. Great for chatbots and support agents. Terrible if you want “try again” or parallel exploration.

Forking-capable: each response has a unique ID, and new requests can reference any previous response as the continuation point. This is what the Responses API (Microsoft Foundry, Azure OpenAI, OpenAI) supports. Users can branch conversations, build “undo” flows, explore multiple answer paths.

If you’re building any kind of agentic workflow where multiple paths might be explored, forking is a capability you want.

Client-managed: you own the complexity

When the service doesn’t store history, your app does everything:

Context window management — you can’t send unlimited history. You need truncation, sliding windows, summarization, or tool-call collapse strategies.
Persistence — in-memory works for demos. Production needs a database, Redis, or blob storage.
Privacy — conversation data never leaves your infrastructure unless you explicitly send it.

The upside on privacy is real. For sensitive applications where you can’t have conversation history sitting on a third-party server, client-managed is the only option.

Agent Framework ships built-in compaction strategies for all the common patterns, so you don’t have to build them from scratch. But you do need to choose and configure the right one.

How Agent Framework abstracts this

The beauty of the framework is that your agent invocation code stays the same regardless of which storage model you’re using. The AgentSession handles the underlying differences.

In C#:

// Works with Chat Completions (client-managed)
// AND with Responses API (service-managed)
// The session handles the details.
AgentSession session = await agent.CreateSessionAsync();
var first = await agent.RunAsync("My name is Alice.", session);
var second = await agent.RunAsync("What is my name?", session);

In Python:

session = agent.create_session()
first = await agent.run("My name is Alice.", session=session)
second = await agent.run("What is my name?", session=session)

When you switch from OpenAI Chat Completions to the Responses API, you change the client configuration — not the agent invocation code.

The Responses API is uniquely flexible

Most providers have a fixed storage model. The Responses API is the exception — it’s configurable via the store parameter:

store=true (default): service stores each response, supports forking via response IDs. Service handles compaction.
store=false: service is stateless, Agent Framework manages history client-side. You control compaction.
Conversations API: linear thread model on top of Responses. Pass a conversation ID instead of a response ID.

Here’s the client-managed mode in practice (C#):

AIAgent agent = new OpenAIClient("<your_api_key>")
 .GetResponseClient("gpt-5.4-mini")
 .AsIChatClientWithStoredOutputDisabled()
 .AsAIAgent(new ChatClientAgentOptions
 {
 ChatOptions = new() { Instructions = "You are a helpful assistant." },
 ChatHistoryProvider = new InMemoryChatHistoryProvider()
 });

And in Python:

agent = Agent(
 client=OpenAIChatClient(),
 name="StatelessAgent",
 instructions="You are a helpful assistant.",
 default_options={"store": False},
 context_providers=[InMemoryHistoryProvider("memory", load_messages=True)],
)

Swap InMemoryHistoryProvider for your DatabaseHistoryProvider when you’re ready for production persistence.

Provider quick reference

Provider	Storage	Model	Compaction
OpenAI / Azure OpenAI Chat Completions	Client	N/A	You
Foundry Agent Service	Service	Linear	Service
Responses API (default)	Service	Forking	Service
Responses API (`store=false`)	Client	N/A	You
Anthropic Claude, Ollama	Client	N/A	You

How to choose

Start with these questions:

Do you need conversation branching or “undo”? → Forking service-managed (Responses API)
Do you need full data sovereignty? → Client-managed, with a database-backed provider
Is this a simple chatbot or support flow? → Service-managed linear is fine
Do you need to migrate between providers later? → Client-managed gives you portability

The most important thing: don’t default to whatever is easiest to start with and forget to revisit it. Changing storage patterns after launch is painful.

Wrapping up

Chat history storage shapes what your agents can actually do — not just in demos but in production, under real user behavior. Agent Framework’s abstractions let you evolve your choice without rewriting your application logic, which is genuinely useful when you’re still figuring out the right model.

Read the full post for the complete decision tree, the Conversations API walkthrough, and the compaction strategy details.

azd Hooks in Python, TypeScript, and .NET: Stop Fighting Shell Scripts

Emiliano Montesdeoca — Thu, 23 Apr 2026 00:00:00 +0000

If you’ve ever had a fully .NET project and still found yourself writing Bash scripts just to run azd hooks, you know the pain. Why switch to shell syntax just for a pre-provision step when everything else in your project is C#?

That frustration is now officially solved. The Azure Developer CLI just shipped multi-language hook support, and it’s exactly as good as it sounds.

Hooks, quickly, if you’re not familiar

Hooks are scripts that run at key points in the azd lifecycle — before provisioning, after deployment, and more. They’re defined in azure.yaml and let you inject custom logic without forking the CLI itself.

Previously you were limited to Bash and PowerShell. Now you can use Python, JavaScript, TypeScript, or .NET — and azd handles the rest automatically.

How the detection works

You just point the hook at a file and azd infers the language from the extension:

hooks:
 preprovision:
 run: ./hooks/setup.py
 postdeploy:
 run: ./hooks/seed.ts
 postprovision:
 run: ./hooks/migrate.cs

That’s it. No extra config. If the extension is ambiguous, you can add an explicit kind: python (or whatever) to override.

Language-specific details worth knowing

Python

Drop a requirements.txt or pyproject.toml next to your script (or anywhere up the directory tree) and azd creates a virtual environment, installs deps, and runs your script:

hooks/
├── setup.py
└── requirements.txt

No virtualenv management on your end. azd walks up from the script location to find the nearest project file.

JavaScript and TypeScript

Same pattern — put a package.json near your script and azd runs npm install first. For TypeScript, it uses npx tsx so there’s no compile step and no tsconfig.json required:

hooks/
├── seed.ts
└── package.json

Want to use pnpm or yarn? There’s a config.packageManager option for that.

.NET

Two modes here, which is nice:

Project mode: If there’s a .csproj next to the script, azd runs dotnet restore and dotnet build automatically.
Single-file mode: On .NET 10+, you can drop a standalone .cs file and it runs directly via dotnet run script.cs. No project file needed.

hooks:
 postprovision:
 run: ./hooks/migrate.cs

If you’re already on .NET 10, single-file mode is honestly the cleanest option for simple migration or seeding scripts. No project scaffolding, no .csproj to maintain.

Executor-specific config

Each language supports an optional config block when you need to tweak the defaults:

hooks:
 preprovision:
 run: ./hooks/setup.ts
 config:
 packageManager: pnpm
 postdeploy:
 run: ./hooks/seed.py
 config:
 virtualEnvName: .venv
 postprovision:
 run: ./hooks/migrate.cs
 config:
 configuration: Release
 framework: net10.0

You can also mix formats in the same hooks: block — different languages for different lifecycle events, platform-specific overrides for Windows vs. Linux, whatever you need.

Why this matters for .NET developers

The boring answer is “consistency.” But honestly it goes deeper than that. Hooks are often the last place in an azd-based project that forces you into a different language context. Now your entire deployment pipeline — from app code to infrastructure scripts to lifecycle hooks — can live in one language.

More practically: you can now reuse your existing .NET utilities in hooks. Have a shared class library for database schema management? Just reference it in your hook project. Have a Python data-seeding script you already wrote? Drop it straight into azure.yaml.

Wrapping up

This is one of those changes that sounds small but quietly removes a lot of friction from daily azd workflows. Multi-language hook support is available now — check the official post for the full docs, and head to the azd GitHub repo to try it out on your next project.