Foundry Local 1.0 proved the concept: run AI models locally on Windows, macOS (Apple Silicon), and Linux x64 with a developer-friendly SDK. Version 1.1 adds three capabilities that cover a lot of real production use cases.
Live Audio Transcription
The most significant new feature: real-time speech-to-text streaming directly from the microphone. Captions, voice UIs, meeting transcription, accessibility tooling — all running locally with zero cloud dependency.
The API is session-based and streams results as they arrive, with is_final markers to distinguish interim from finalized text. Available across all language bindings: JavaScript, C#, Python, and Rust.
Load a streaming speech model from the catalog, create a session with audio settings (sample rate, channels, language), start it, push raw PCM audio chunks, and consume the async stream of results. The post has full Python and C# examples.
Text Embeddings
Semantic search, RAG pipelines, clustering, similarity matching — these all require embeddings. Foundry Local 1.1 adds embedding model support so you can generate vectors locally from the same SDK, without sending data to a cloud endpoint.
For applications where data residency matters or where you’re processing sensitive content, local embedding generation is a meaningful capability.
Responses API
Foundry Local now supports the Responses API — the structured interface designed for agentic interactions. This adds:
- Tool calling — let locally-running models invoke tools you define
- Multimodal vision-language input — pass image + text to vision-capable models
- Compatible with the standard API shape, so existing agents targeting OpenAI’s Responses API work against local models
Package Size Improvements
Two changes reduce the JavaScript package size:
- The
koffiFFI layer has been replaced with a custom Node-API C addon - WebGPU execution provider ships as a separate plugin, so applications that don’t need GPU acceleration don’t pay the size cost
The C# SDK now targets lower framework versions for broader .NET compatibility.
Why This Matters
The three capabilities together — transcription, embeddings, tool calling — cover the core building blocks of many AI applications. Running them locally means:
- No internet required
- No per-token costs
- No data leaving the machine
- Consistent latency regardless of network conditions
Foundry Local is the right choice for edge scenarios, privacy-sensitive workloads, offline applications, or anything where you want to avoid cloud dependency during development.
Original post: Foundry Local 1.1: Live Transcription, Embeddings, and Responses API
