<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>On-Device AI | The .NET Blog</title><link>https://thedotnetblog.com/tags/on-device-ai/</link><description>Articles, tutorials and insights from the .NET community.</description><generator>Hugo</generator><language>en</language><managingEditor>@thedotnetblog (The .NET Blog)</managingEditor><webMaster>@thedotnetblog</webMaster><lastBuildDate>Thu, 28 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://thedotnetblog.com/tags/on-device-ai/index.xml" rel="self" type="application/rss+xml"/><item><title>Foundry Local 1.1: Real-Time Transcription, Embeddings, and the Responses API</title><link>https://thedotnetblog.com/news/emiliano-montesdeoca/foundry-local-11-transcription-embeddings-responses-api/</link><pubDate>Thu, 28 May 2026 00:00:00 +0000</pubDate><author>Emiliano Montesdeoca</author><guid>https://thedotnetblog.com/news/emiliano-montesdeoca/foundry-local-11-transcription-embeddings-responses-api/</guid><description>Foundry Local 1.1 adds live microphone transcription, text embeddings, and Responses API support — all running locally with no cloud dependency, no network latency, no per-token cost.</description><content:encoded>&lt;p&gt;Foundry Local 1.0 proved the concept: run AI models locally on Windows, macOS (Apple Silicon), and Linux x64 with a developer-friendly SDK. Version 1.1 adds three capabilities that cover a lot of real production use cases.&lt;/p&gt;
&lt;h2 id="live-audio-transcription"&gt;Live Audio Transcription&lt;/h2&gt;
&lt;p&gt;The most significant new feature: real-time speech-to-text streaming directly from the microphone. Captions, voice UIs, meeting transcription, accessibility tooling — all running locally with zero cloud dependency.&lt;/p&gt;
&lt;p&gt;The API is session-based and streams results as they arrive, with &lt;code&gt;is_final&lt;/code&gt; markers to distinguish interim from finalized text. Available across all language bindings: JavaScript, C#, Python, and Rust.&lt;/p&gt;
&lt;p&gt;Load a streaming speech model from the catalog, create a session with audio settings (sample rate, channels, language), start it, push raw PCM audio chunks, and consume the async stream of results. The post has full Python and C# examples.&lt;/p&gt;
&lt;h2 id="text-embeddings"&gt;Text Embeddings&lt;/h2&gt;
&lt;p&gt;Semantic search, RAG pipelines, clustering, similarity matching — these all require embeddings. Foundry Local 1.1 adds embedding model support so you can generate vectors locally from the same SDK, without sending data to a cloud endpoint.&lt;/p&gt;
&lt;p&gt;For applications where data residency matters or where you&amp;rsquo;re processing sensitive content, local embedding generation is a meaningful capability.&lt;/p&gt;
&lt;h2 id="responses-api"&gt;Responses API&lt;/h2&gt;
&lt;p&gt;Foundry Local now supports the &lt;a href="https://platform.openai.com/docs/api-reference/responses"&gt;Responses API&lt;/a&gt; — the structured interface designed for agentic interactions. This adds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Tool calling&lt;/strong&gt; — let locally-running models invoke tools you define&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multimodal vision-language input&lt;/strong&gt; — pass image + text to vision-capable models&lt;/li&gt;
&lt;li&gt;Compatible with the standard API shape, so existing agents targeting OpenAI&amp;rsquo;s Responses API work against local models&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="package-size-improvements"&gt;Package Size Improvements&lt;/h2&gt;
&lt;p&gt;Two changes reduce the JavaScript package size:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;koffi&lt;/code&gt; FFI layer has been replaced with a custom Node-API C addon&lt;/li&gt;
&lt;li&gt;WebGPU execution provider ships as a separate plugin, so applications that don&amp;rsquo;t need GPU acceleration don&amp;rsquo;t pay the size cost&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The C# SDK now targets lower framework versions for broader .NET compatibility.&lt;/p&gt;
&lt;h2 id="why-this-matters"&gt;Why This Matters&lt;/h2&gt;
&lt;p&gt;The three capabilities together — transcription, embeddings, tool calling — cover the core building blocks of many AI applications. Running them locally means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No internet required&lt;/li&gt;
&lt;li&gt;No per-token costs&lt;/li&gt;
&lt;li&gt;No data leaving the machine&lt;/li&gt;
&lt;li&gt;Consistent latency regardless of network conditions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Foundry Local is the right choice for edge scenarios, privacy-sensitive workloads, offline applications, or anything where you want to avoid cloud dependency during development.&lt;/p&gt;
&lt;p&gt;Original post: &lt;a href="https://devblogs.microsoft.com/foundry/foundry-local-v1-1/"&gt;Foundry Local 1.1: Live Transcription, Embeddings, and Responses API&lt;/a&gt;&lt;/p&gt;</content:encoded></item></channel></rss>