NL EN Book Demo Login Get Started

Model garden

Phi 3 medium 128k instruct

Name: Phi 3 medium 128k instruct hosting (EU)
Brand: HostYourAI
Price: 0.15 EUR
Availability: InStock

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)

Start for free ← All models

microsoft/Phi-3-medium-128k-instruct

text->text · microsoft · EU-hosted

14B

Parameters

131K

Context window

33GB

Minimum VRAM

POST /api/v1/chat/completions200 OK

Specifications

Parameters 14B

Context window 131,072 tokens

Minimum VRAM 33 GB

Architecture Phi3ForCausalLM (vLLM)

License mit

Modality text->text

Released May 2024

Publisher microsoft ↗

Pricing

€0.15

Input (per 1M tokens)

€0.25

Output (per 1M tokens)

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly — see pricing.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-3-medium-128k-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run Phi 3 medium 128k instruct in the EU?

Yes. HostYourAI runs Phi 3 medium 128k instruct on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting Phi 3 medium 128k instruct GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does Phi 3 medium 128k instruct cost?

Via the shared EU router you pay €0.15 per million input tokens and €0.25 per million output tokens, with no fixed costs. For high volume or isolation you can also run Phi 3 medium 128k instruct as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Microsoft

FastContext 1.0 4B RL

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context View model →

FastContext 1.0 4B SFT

4B 262K context View model →

X Reasoner 7B

We introduce X-Reasoner, a vision-language model posttrained solely on general-domain text for generalizable reasoning, using a twostage approach: an initial supervised fine-tuning phase with distilled long chainof-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-theart models trained with in-domain and multimodal data across various general and medical benchmarks. More details can be found in the paper: X-Reasoner: T

8.3B 128K context View model →

FrogBoss 32B 2510

FrogBoss is built on the Qwen3-32B transformer architecture with a maximum context length of 64k tokens. The model uses multi-turn debugging workflows and complex code reasoning. Unlike general-purpose LLMs, FrogBoss is specialized for software engineering tasks.

32B 41K context View model →

OptiMind SFT

OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into correct MILP formulations.

21B 131K context View model →

Fara 7B

Description: Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

8.3B 128K context View model →

Try Phi 3 medium 128k instruct for free

Creating an account takes a minute. Test Phi 3 medium 128k instruct straight away in the playground.

Start for free