Model garden

gemma 3 1b pt

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

gemma 3 1b pt is an open-source language model from Google with 1B parameters, hosted on EU GPUs via an OpenAI-compatible API.

google/gemma-3-1b-pt vLLM ready
text->text · google · EU-hosted
1B
Parameters
Context window
8GB
Minimum VRAM
POST /api/v1/chat/completions200 OK

Specifications

Parameters 1B
Minimum VRAM 8 GB
Architecture Gemma3ForCausalLM (vLLM)
License gemma
Modality text->text
Released February 2025
Publisher google ↗

Pricing

€0.03
Input (per 1M tokens)
€0.06
Output (per 1M tokens)

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly — see pricing.

✓ Verified working on 20-06-2026 — responded in 365 ms on our EU infrastructure.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-3-1b-pt",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run gemma 3 1b pt in the EU?

Yes. HostYourAI runs gemma 3 1b pt on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting gemma 3 1b pt GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does gemma 3 1b pt cost?

Via the shared EU router you pay €0.03 per million input tokens and €0.06 per million output tokens, with no fixed costs. For high volume or isolation you can also run gemma 3 1b pt as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Google

gemma 4 31B it qat w4a16 ct

[!Note] This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. Four versions of the QAT checkpoints are available: Unquantized QAT checkpoints (Q40): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. GGUF (Q40): Ready-to-deploy formats for broad ecosystem compatibility. Availabl

34B View model →
gemma 4 26B A4B it qat q4 0 unquantized

[!Note] This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. Four versions of the QAT checkpoints are available: Unquantized QAT checkpoints (Q40): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. GGUF (Q40): Ready-to-deploy formats for broad ecosystem compatibility. Availabl

27B View model →
gemma 4 31B it qat q4 0 unquantized

[!Note] This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. Four versions of the QAT checkpoints are available: Unquantized QAT checkpoints (Q40): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. GGUF (Q40): Ready-to-deploy formats for broad ecosystem compatibility. Availabl

33B View model →
gemma 4 31B

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.

33B View model →
gemma 4 26B A4B

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.

27B View model →
gemma 4 26B A4B it

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.

27B View model →

Try gemma 3 1b pt for free

Creating an account takes a minute. Test gemma 3 1b pt straight away in the playground.

Start for free