Model garden

GLM Z1 9B 0414

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

The GLM family welcomes a new generation of open-source models, the GLM-4-32B-0414 series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Ba...

zai-org/GLM-Z1-9B-0414 vLLM ready
text->text · zai-org · EU-hosted
9.4B
Parameters
33K
Context window
22GB
Minimum VRAM
POST /api/v1/chat/completions200 OK

Specifications

Parameters 9.4B
Context window 32,768 tokens
Minimum VRAM 22 GB
Architecture Glm4ForCausalLM (vLLM)
License mit
Modality text->text
Released April 2025
Publisher zai-org ↗

Pricing

€0.10
Input (per 1M tokens)
€0.18
Output (per 1M tokens)

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly — see pricing.

✓ Verified working on 28-06-2026 — responded in 2118 ms on our EU infrastructure.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-Z1-9B-0414",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run GLM Z1 9B 0414 in the EU?

Yes. HostYourAI runs GLM Z1 9B 0414 on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting GLM Z1 9B 0414 GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does GLM Z1 9B 0414 cost?

Via the shared EU router you pay €0.10 per million input tokens and €0.18 per million output tokens, with no fixed costs. For high volume or isolation you can also run GLM Z1 9B 0414 as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Z.AI

GLM 5.2

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou

753B 1M context View model →
GLM 5.2 FP8

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou

753B 1M context View model →
GLM 5.1 FP8

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

754B 203K context View model →
GLM 5.1

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

754B 203K context View model →
GLM 5

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

754B 203K context View model →
GLM 5 FP8

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

754B 203K context View model →

Try GLM Z1 9B 0414 for free

Creating an account takes a minute. Test GLM Z1 9B 0414 straight away in the playground.

Start for free