Model garden

Phi 4 reasoning plus

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

[!IMPORTANT] To fully take advantage of the model's capabilities, inference must use temperature=0.8, topk=50, topp=0.95, and dosample=True. For more complex queries, set maxnewtokens=32768 to allow for longer chain-of-thought (CoT).

microsoft/Phi-4-reasoning-plus vLLM ready
text->text · microsoft · EU-hosted
15B
Parameters
33K
Context window
34GB
Minimum VRAM
POST /api/v1/chat/completions200 OK

Specifications

Parameters 15B
Context window 32,768 tokens
Minimum VRAM 34 GB
Architecture Phi3ForCausalLM (vLLM)
License mit
Modality text->text
Released April 2025
Publisher microsoft ↗

Pricing

€0.15
Input (per 1M tokens)
€0.25
Output (per 1M tokens)

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly — see pricing.

✓ Verified working on 19-06-2026 — responded in 1734 ms on our EU infrastructure.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-4-reasoning-plus",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run Phi 4 reasoning plus in the EU?

Yes. HostYourAI runs Phi 4 reasoning plus on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting Phi 4 reasoning plus GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does Phi 4 reasoning plus cost?

Via the shared EU router you pay €0.15 per million input tokens and €0.25 per million output tokens, with no fixed costs. For high volume or isolation you can also run Phi 4 reasoning plus as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Microsoft

FastContext 1.0 4B RL

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context View model →
FastContext 1.0 4B SFT

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context View model →
X Reasoner 7B

We introduce X-Reasoner, a vision-language model posttrained solely on general-domain text for generalizable reasoning, using a twostage approach: an initial supervised fine-tuning phase with distilled long chainof-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-theart models trained with in-domain and multimodal data across various general and medical benchmarks. More details can be found in the paper: X-Reasoner: T

8.3B 128K context View model →
FrogBoss 32B 2510

FrogBoss is built on the Qwen3-32B transformer architecture with a maximum context length of 64k tokens. The model uses multi-turn debugging workflows and complex code reasoning. Unlike general-purpose LLMs, FrogBoss is specialized for software engineering tasks.

32B 41K context View model →
OptiMind SFT

OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into correct MILP formulations.

21B 131K context View model →
Fara 7B

Description: Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

8.3B 128K context View model →

Try Phi 4 reasoning plus for free

Creating an account takes a minute. Test Phi 4 reasoning plus straight away in the playground.

Start for free