NL EN Book Demo Login Get Started

Model garden

glm 4 9b chat 1m hf

Name: glm 4 9b chat 1m hf hosting (EU)
Brand: HostYourAI
Price: 0.10 EUR
Availability: InStock

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

If you are using the weights from this repository, please update to

Start for free ← All models

zai-org/glm-4-9b-chat-1m-hf

text->text · zai-org · EU-hosted

9.5B

Parameters

1,049K

Context window

22GB

Minimum VRAM

POST /api/v1/chat/completions200 OK

Specifications

Parameters 9.5B

Context window 1,048,576 tokens

Minimum VRAM 22 GB

Architecture GlmForCausalLM (vLLM)

License other

Modality text->text

Released October 2024

Publisher zai-org ↗

Pricing

€0.10

Input (per 1M tokens)

€0.18

Output (per 1M tokens)

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly — see pricing.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/glm-4-9b-chat-1m-hf",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run glm 4 9b chat 1m hf in the EU?

Yes. HostYourAI runs glm 4 9b chat 1m hf on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting glm 4 9b chat 1m hf GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does glm 4 9b chat 1m hf cost?

Via the shared EU router you pay €0.10 per million input tokens and €0.18 per million output tokens, with no fixed costs. For high volume or isolation you can also run glm 4 9b chat 1m hf as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Z.AI

GLM 5.2

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou

753B 1M context View model →

GLM 5.2 FP8

753B 1M context View model →

GLM 5.1 FP8

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

754B 203K context View model →

GLM 5.1

754B 203K context View model →

GLM 5

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

754B 203K context View model →

GLM 5 FP8

754B 203K context View model →

Try glm 4 9b chat 1m hf for free

Creating an account takes a minute. Test glm 4 9b chat 1m hf straight away in the playground.

Start for free