Microsoft Foundry Local Brings AI to Your Desktop

Microsoft quietly launched Foundry Local, an open-source tool that runs AI models entirely on your device without cloud dependencies, subscriptions, or authentication. Built on ONNX Runtime with hardware acceleration for CPUs, GPUs, and NPUs, it integrates seamlessly into applications through an OpenAI-compatible API.

Zero Cloud, Maximum Privacy

Unlike cloud-based services, Foundry Local processes everything on-device. No Azure subscription required, no data leaves your machine, and internet connectivity is optional after initial model downloads. This addresses privacy concerns for healthcare, finance, and regulated industries where sensitive data cannot reach external servers.

Feature Foundry Local LM Studio Ollama
Model Format ONNX only GGUF, ONNX, others GGUF
Model Catalog Limited (Phi, Qwen 2.5, Mistral, GPT-OSS) Extensive (hundreds) Extensive (hundreds)
Windows ARM Support Yes (NPU optimized) Limited Limited
GUI No (CLI only) Yes (full featured) No (CLI only)
API Compatibility OpenAI OpenAI OpenAI

Installation Takes Seconds

Getting started requires just one terminal command per platform:

Windows Installation

  1. Open PowerShell or Command Prompt
  2. Run: winget install Microsoft.FoundryLocal
  3. Verify installation: foundry --version

macOS Installation

  1. Open Terminal
  2. Run: brew install microsoft/foundrylocal/foundrylocal
  3. Confirm setup: foundry service status

Running Your First Model

After installation, running models involves three simple steps:

  1. List available models: foundry model list displays hardware-compatible options
  2. Start a model: foundry model run qwen2.5-0.5b auto-downloads and launches
  3. Interact via CLI: Type questions directly in the terminal or integrate via the REST API at http://localhost:5273

Sample API Request

curl http://localhost:5273/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-0.5b",
    "messages": [{"role": "user", "content": "Explain async programming"}]
  }'

Limited Model Selection Raises Questions

Critics highlight Foundry Local’s restricted catalog compared to competitors. Current models include Microsoft’s Phi family (Phi-3, Phi-4), Qwen 2.5 series, Mistral variants, and OpenAI’s GPT-OSS-20B. Notably absent: Llama 3+, Qwen 3, Gemma 2, and other recent releases.

Why the Gap?

  • ONNX requirement: Models need conversion from PyTorch/GGUF, adding friction
  • Optimization focus: Microsoft prioritizes deep hardware integration over breadth
  • Preview status: Catalog expansion expected before general availability

Developers can convert custom models using Microsoft Olive, though this requires technical expertise and time.

Windows ARM Gets Premium Treatment

Foundry Local excels on Windows ARM devices like Snapdragon X Elite laptops, where most competitors struggle. Pre-optimized models leverage NPUs (Neural Processing Units) for power-efficient inference, a rare advantage on an otherwise underserved platform.

When to Choose Foundry Local

Best For:

  • Windows ARM users needing NPU acceleration
  • Enterprise teams requiring Microsoft ecosystem integration
  • Developers building OpenAI-compatible apps with local fallback
  • Projects demanding maximum data sovereignty

Avoid If You Need:

  • Extensive model variety (use LM Studio instead)
  • Graphical user interface (Ollama + Open WebUI offers this)
  • Latest model releases the day they drop
  • Linux support (not yet available)

Microsoft entered the local AI race years after established players, but Foundry Local offers distinct advantages: bulletproof Windows integration, enterprise-grade ONNX optimization, and unmatched ARM support. The limited model catalog and CLI-only interface will deter casual users, but for developers building production apps on Windows, especially ARM-based Copilot+ PCs, it deserves serious consideration.

Full documentation and starter projects available at foundrylocal.ai and the GitHub repository.