Model garden

Phi 4 reasoning

Direct via de EU-router of als dedicated GPU-deployment. Data blijft in Europa.

[!IMPORTANT] To fully take advantage of the model's capabilities, inference must use temperature=0.8, topk=50, topp=0.95, and dosample=True. For more complex queries, set maxnewtokens=32768 to allow for longer chain-of-thought (CoT).

microsoft/Phi-4-reasoning vLLM ready
text->text · microsoft · EU-hosted
15B
Parameters
33K
Contextvenster
34GB
Minimale VRAM
POST /api/v1/chat/completions200 OK

Specificaties

Parameters 15B
Contextvenster 32,768 tokens
Minimale VRAM 34 GB
Architectuur Phi3ForCausalLM (vLLM)
Licentie mit
Modaliteit text->text
Uitgebracht April 2025
Uitgever microsoft ↗

Prijzen

€0.15
Input (per 1M tokens)
€0.25
Output (per 1M tokens)

Gedeelde EU-router, pay-per-token, scale-to-zero. Dedicated GPU-deployments worden per uur afgerekend — zie prijzen.

✓ Werkend geverifieerd op 19-06-2026 — respons in 1758 ms op onze EU-infrastructuur.

Direct aanroepen

Drop-in vervanger voor OpenAI: wijzig alleen de base-URL en de API-key. Ook het Anthropic-formaat (/v1/messages) wordt ondersteund.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-4-reasoning",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Veelgestelde vragen

Kan ik Phi 4 reasoning in de EU draaien?

Ja. HostYourAI draait Phi 4 reasoning op GPU's in Europese datacenters via vLLM. Prompts en outputs verlaten de EU niet en er is geen Amerikaanse cloudprovider in de keten.

Is Phi 4 reasoning hosten AVG/GDPR-compliant?

Ja. Alle verwerking vindt plaats binnen de EU, er is een verwerkersovereenkomst (DPA) beschikbaar en de subprocessor-lijst is openbaar. Open-source gewichten betekenen ook: geen training op jouw data.

Wat kost Phi 4 reasoning?

Via de gedeelde EU-router betaal je €0.15 per miljoen input-tokens en €0.25 per miljoen output-tokens, zonder vaste kosten. Voor hoge volumes of isolatie kun je Phi 4 reasoning ook als dedicated GPU-instance per uur draaien.

Is de API compatibel met OpenAI?

Ja. Je gebruikt de standaard OpenAI-SDK's met een aangepaste base-URL (https://hostyourai.com/api/v1). Ook de Anthropic Messages API wordt ondersteund als drop-in.

Andere modellen van Microsoft

FastContext 1.0 4B RL

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context Bekijk model →
FastContext 1.0 4B SFT

FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.

4B 262K context Bekijk model →
X Reasoner 7B

We introduce X-Reasoner, a vision-language model posttrained solely on general-domain text for generalizable reasoning, using a twostage approach: an initial supervised fine-tuning phase with distilled long chainof-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-theart models trained with in-domain and multimodal data across various general and medical benchmarks. More details can be found in the paper: X-Reasoner: T

8.3B 128K context Bekijk model →
FrogBoss 32B 2510

FrogBoss is built on the Qwen3-32B transformer architecture with a maximum context length of 64k tokens. The model uses multi-turn debugging workflows and complex code reasoning. Unlike general-purpose LLMs, FrogBoss is specialized for software engineering tasks.

32B 41K context Bekijk model →
OptiMind SFT

OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into correct MILP formulations.

21B 131K context Bekijk model →
Fara 7B

Description: Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

8.3B 128K context Bekijk model →

Probeer Phi 4 reasoning gratis

Account aanmaken duurt een minuut. Test Phi 4 reasoning direct in de playground.

Start gratis