Model garden

Phi 3 mini 128k instruct

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

🎉Phi-4: [multimodal-instruct | onnx]; [mini-instruct | onnx]

microsoft/Phi-3-mini-128k-instruct vLLM ready
text->text · microsoft · EU-hosted
3.8B
Parameters
131K
Context window
9GB
Minimum VRAM
POST /api/v1/chat/completions200 OK

Specifications

Parameters 3.8B
Context window 131,072 tokens
Minimum VRAM 9 GB
Architecture Phi3ForCausalLM (vLLM)
License mit
Modality text->text
Released April 2024
Publisher microsoft ↗

Pricing

€0.05
Input (per 1M tokens)
€0.10
Output (per 1M tokens)

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly — see pricing.

✓ Verified working on 27-06-2026 — responded in 1982 ms on our EU infrastructure.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-3-mini-128k-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run Phi 3 mini 128k instruct in the EU?

Yes. HostYourAI runs Phi 3 mini 128k instruct on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting Phi 3 mini 128k instruct GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does Phi 3 mini 128k instruct cost?

Via the shared EU router you pay €0.05 per million input tokens and €0.10 per million output tokens, with no fixed costs. For high volume or isolation you can also run Phi 3 mini 128k instruct as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Microsoft

FastContext 1.0 4B RL

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context View model →
FastContext 1.0 4B SFT

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context View model →
X Reasoner 7B

We introduce X-Reasoner, a vision-language model posttrained solely on general-domain text for generalizable reasoning, using a twostage approach: an initial supervised fine-tuning phase with distilled long chainof-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-theart models trained with in-domain and multimodal data across various general and medical benchmarks. More details can be found in the paper: X-Reasoner: T

8.3B 128K context View model →
FrogBoss 32B 2510

FrogBoss is built on the Qwen3-32B transformer architecture with a maximum context length of 64k tokens. The model uses multi-turn debugging workflows and complex code reasoning. Unlike general-purpose LLMs, FrogBoss is specialized for software engineering tasks.

32B 41K context View model →
OptiMind SFT

OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into correct MILP formulations.

21B 131K context View model →
Fara 7B

Description: Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

8.3B 128K context View model →

Try Phi 3 mini 128k instruct for free

Creating an account takes a minute. Test Phi 3 mini 128k instruct straight away in the playground.

Start for free