382 open-source modellen, gehost op GPU's in de EU. Eén OpenAI-compatibele API-key, scale-to-zero of dedicated.
382 modellen
We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou
We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou
FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.
FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.
[!Note] This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. Four versions of the QAT checkpoints are available: Unquantized QAT checkpoints (Q40): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. GGUF (Q40): Ready-to-deploy formats for broad ecosystem compatibility. Availabl
[!Note] This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. Four versions of the QAT checkpoints are available: Unquantized QAT checkpoints (Q40): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. GGUF (Q40): Ready-to-deploy formats for broad ecosystem compatibility. Availabl
[!Note] This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. Four versions of the QAT checkpoints are available: Unquantized QAT checkpoints (Q40): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. GGUF (Q40): Ready-to-deploy formats for broad ecosystem compatibility. Availabl
We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.
We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.
[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).
GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
[!Note] This repository contains int4-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains int4-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains int4-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains model weights and configuration files for the pre-trained only model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc. The intended use cases are fine-tuning, in-context learning experiments, and other research or development purposes, not direct interaction. However, the control tokens, e.g., <|imstart| and <|imend| were trained to allow efficient LoRA-style PEFT with the official chat template, mitigating the need to finetune embeddings, a significant optimization given Qwen3.5's larger
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. In light of its parameter scale, the intended use cases are prototyping, task-specific fine-tuning, and other research or development purposes.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. In light of its parameter scale, the intended use cases are prototyping, task-specific fine-tuning, and other research or development purposes.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.
[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.
[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.
[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
We introduce X-Reasoner, a vision-language model posttrained solely on general-domain text for generalizable reasoning, using a twostage approach: an initial supervised fine-tuning phase with distilled long chainof-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-theart models trained with in-domain and multimodal data across various general and medical benchmarks. More details can be found in the paper: X-Reasoner: T
Today, we're announcing Qwen3-Coder-Next-FP8, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:
Today, we're announcing Qwen3-Coder-Next, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR deliver
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8:
GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
translategemma 27b it is een multimodaal taalmodel van Google met 29B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
translategemma 12b it is een multimodaal taalmodel van Google met 13B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
medgemma 1.5 4b it is een multimodaal taalmodel van Google met 4.3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
FrogBoss is built on the Qwen3-32B transformer architecture with a maximum context length of 64k tokens. The model uses multi-turn debugging workflows and complex code reasoning. Unlike general-purpose LLMs, FrogBoss is specialized for software engineering tasks.
GLM-4.7, your new coding partner, is coming with the following features:
GLM-4.7, your new coding partner, is coming with the following features:
OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into correct MILP formulations.
⚠️ This project is intended for research and educational purposes only. Any use for illegal data access, system interference, or unlawful activities is strictly prohibited. Please review our Terms of Use carefully.
⚠️ This project is intended for research and educational purposes only. Any use for illegal data access, system interference, or unlawful activities is strictly prohibited. Please review our Terms of Use carefully.
This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:
[!Tip] This model was contributed by Xenova from Hugging Face. We sincerely appreciate the integration and community collaboration. While preliminary functionality checks have been performed, comprehensive testing has not yet been completed. We recommend you to proceed with caution and conducting your own evaluations for specific use cases. If any issues arise, open a PR/Issue here and we will try to address them promptly.
- Repository: https://github.com/zheny2751-dotcom/WebVIA - Paper: https://arxiv.org/pdf/2511.06251
- Repository: https://github.com/zai-org/UI2CodeN - Paper: https://arxiv.org/abs/2511.08195
Description: Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.
- Repository: https://github.com/thu-coai/Glyph - Paper: https://arxiv.org/abs/2510.17800
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
This repository contains an FP8 quantized version of the Qwen3-VL-32B-Instruct model. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original BF16 model. Enjoy!
This repository contains an FP8 quantized version of the Qwen3-VL-32B-Thinking model. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original BF16 model. Enjoy!
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
torch==2.6.0 transformers==4.46.3 tokenizers==0.20.3 einops addict easydict pip install flash-attn==2.7.3 --no-build-isolation
This repository contains an FP8 quantized version of the Qwen3-VL-8B-Instruct model. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original BF16 model. Enjoy!
This repository contains an FP8 quantized version of the Qwen3-VL-4B-Instruct model. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original BF16 model. Enjoy!
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
This repository contains an FP8 quantized version of the Qwen3-VL-30B-A3B-Instruct model. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original BF16 model. Enjoy!
Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation (by training it to predict user turns in a large corpus of conversations called WildChat). This model is useful in simulating more realistic conversations, which is in turn useful in the development of more robust assistants.
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
Compared with GLM-4.5, GLM-4.6 brings several key improvements:
Compared with GLM-4.5, GLM-4.6 brings several key improvements:
We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
This update maintains the model's original capabilities while addressing issues reported by users, including:
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss. With these models, you can classify text content based on safety policies that you provide and perform a suite of foundational safety tasks. These models are intended for safety use cases. For other applications, we recommend using gpt-oss models.
gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss. With these models, you can classify text content based on safety policies that you provide and perform a suite of foundational safety tasks. These models are intended for safety use cases. For other applications, we recommend using gpt-oss models.
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents.
We introduce the updated version of the Qwen3-4B-FP8 non-thinking mode, named Qwen3-4B-Instruct-2507-FP8, featuring the following key enhancements:
We introduce the updated version of the Qwen3-4B non-thinking mode, named Qwen3-4B-Instruct-2507, featuring the following key enhancements:
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct-FP8. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:
Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd ove
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
The MediPhi Model Collection comprises 7 small language models of 3.8B parameters from the base model Phi-3.5-mini-instruct specialized in the medical and clinical domains. The collection is designed in a modular fashion. Five MediPhi experts are fine-tuned on various medical corpora (i.e. PubMed commercial, Medical Wikipedia, Medical Guidelines, Medical Coding, and open-source clinical documents) and merged back with the SLERP method in their base model to conserve general abilities. One model combined all five experts into one general expert with the multi-model merging method BreadCrumbs. F
Dayhoff is an Atlas of both protein sequence data and generative language models — a centralized resource that brings together 3.34 billion protein sequences across 1.7 billion clusters of metagenomic and natural protein sequences (GigaRef), 46 million structure-derived synthetic sequences (BackboneRef), and 16 million multiple sequence alignments (OpenProteinSet). These models can natively predict zero-shot mutation effects on fitness, scaffold structural motifs by conditioning on evolutionary or structural context, and perform guided generation of novel proteins within specified families. Le
Vision-Language Models (VLMs) have become foundational components of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs must evolve beyond basic multimodal perception to enhance their reasoning capabilities in complex tasks. This involves improving accuracy, comprehensiveness, and intelligence, enabling applications such as complex problem solving, long-context understanding, and multimodal agents.
Vision-Language Models (VLMs) have become foundational components of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs must evolve beyond basic multimodal perception to enhance their reasoning capabilities in complex tasks. This involves improving accuracy, comprehensiveness, and intelligence, enabling applications such as complex problem solving, long-context understanding, and multimodal agents.
Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by Phi-3.5-MoE and GRIN-MoE using the SlimMoE approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a larger variant, Phi-mini-MoE, with 7.6B total and 2.4B activated pa
Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by Phi-3.5-MoE and GRIN-MoE using the SlimMoE approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, Phi-tiny-MoE, with 3.8B total and 1.1B activated p
gemma 3n E2B it is een multimodaal taalmodel van Google met 5.4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3n E4B it is een multimodaal taalmodel van Google met 7.8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
This model was introduced in the paper GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents. It is developed based on UI-TARS-2B-SFT and is designed to predict the correctness of an action position given a language instruction. This model is well-suited for GUI-Actor, as its attention map effectively provides diverse candidates for verification with only a single inference.
The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.
The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.
medgemma 27b text it is een open-source taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
medgemma 4b it is een multimodaal taalmodel van Google met 4.3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Phi-4-mini-reasoning is a lightweight open model built upon synthetic data with a focus on high-quality, reasoning dense data further finetuned for more advanced math reasoning capabilities. The model belongs to the Phi-4 model family and supports 128K token context length.
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
Llama Guard 4 12B is een multimodaal taalmodel van Meta met 12B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
[!IMPORTANT] To fully take advantage of the model's capabilities, inference must use temperature=0.8, topk=50, topp=0.95, and dosample=True. For more complex queries, set maxnewtokens=32768 to allow for longer chain-of-thought (CoT).
[!IMPORTANT] To fully take advantage of the model's capabilities, inference must use temperature=0.8, topk=50, topp=0.95, and dosample=True. For more complex queries, set maxnewtokens=32768 to allow for longer chain-of-thought (CoT).
gemma 3 12b it qat q4 0 unquantized is een multimodaal taalmodel van Google met 12B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
The GLM family welcomes a new generation of open-source models, the GLM-4-32B-0414 series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance i
The GLM family welcomes a new generation of open-source models, the GLM-4-32B-0414 series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance i
The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforc
The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforc
Llama 4 Maverick 17B 128E is een multimodaal taalmodel van Meta met 402B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 4 Scout 17B 16E is een multimodaal taalmodel van Meta met 109B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 4 Scout 17B 16E Instruct is een multimodaal taalmodel van Meta met 109B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 4 Maverick 17B 128E Instruct is een multimodaal taalmodel van Meta met 402B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 4 Maverick 17B 128E Instruct FP8 is een multimodaal taalmodel van Meta met 402B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects.
txgemma 9b chat is een open-source taalmodel van Google met 9.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
txgemma 2b predict is een open-source taalmodel van Google met 2.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
gemma 3 1b it is een open-source taalmodel van Google met 1B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 12b it is een multimodaal taalmodel van Google met 12B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 12b pt is een multimodaal taalmodel van Google met 12B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 27b it is een multimodaal taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 27b pt is een multimodaal taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 1b pt is een open-source taalmodel van Google met 1B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 4b it is een multimodaal taalmodel van Google met 4.3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 4b pt is een multimodaal taalmodel van Google met 4.3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
🎉Phi-4: [mini-reasoning | reasoning] | [multimodal-instruct | onnx]; [mini-instruct | onnx]
In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
--- licensename: qwen-research licenselink: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-AWQ/blob/main/LICENSE language: - en pipelinetag: image-text-to-text tags: - multimodal libraryname: transformers basemodel: - Qwen/Qwen2.5-VL-3B-Instruct ---
--- license: other licensename: qwen licenselink: https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/blob/main/LICENSE language: - en pipelinetag: image-text-to-text tags: - multimodal libraryname: transformers ---
--- license: apache-2.0 language: - en pipelinetag: image-text-to-text tags: - multimodal libraryname: transformers ---
--- licensename: qwen-research licenselink: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE language: - en pipelinetag: image-text-to-text tags: - multimodal libraryname: transformers ---
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorpor
If you are using the weights from this repository, please update to
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning
The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. The model supports bilingual (Chinese and English) interaction with both screenshots and language input.
Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from:
Llama 3.3 70B Instruct is een open-source taalmodel van Meta met 71B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Install the transformers library from the source code:
Install the transformers library from the source code:
paligemma2 28b mix 448 is een multimodaal taalmodel van Google met 28B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma2 3b pt 448 is een multimodaal taalmodel van Google met 3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma2 3b pt 224 is een multimodaal taalmodel van Google met 3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma2 3b ft docci 448 is een multimodaal taalmodel van Google met 3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma2 3b mix 224 is een multimodaal taalmodel van Google met 3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Install the transformers library from the source code:
Install the transformers library from the source code:
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
If you are using the weights from this repository, please update to
If you are using the weights from this repository, please update to
This model hub includes a finetuned version of YOLOv8 and a finetuned BLIP-2 model on the above dataset respectively. For more details of the models used and finetuning, please refer to the paper.
gemma 2 2b jpn it is een open-source taalmodel van Google met 2.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama Guard 3 1B is een open-source taalmodel van Meta met 1.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama Guard 3 11B Vision is een multimodaal taalmodel van Meta met 11B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 90B Vision Instruct is een multimodaal taalmodel van Meta met 89B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 90B Vision is een multimodaal taalmodel van Meta met 89B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 11B Vision Instruct is een multimodaal taalmodel van Meta met 11B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 11B Vision is een multimodaal taalmodel van Meta met 11B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 3B is een open-source taalmodel van Meta met 3.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 3B Instruct is een open-source taalmodel van Meta met 3.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 1B Instruct is een open-source taalmodel van Meta met 1.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 1B is een open-source taalmodel van Meta met 1.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
gemma 7b aps it is een open-source taalmodel van Google met 8.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here.
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.
Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
🎉Phi-4: [multimodal-instruct | onnx]; [mini-instruct | onnx]
LongWriter-glm4-9b is trained based on glm-4-9b, and is capable of generating 10,000+ words at once.
LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once.
AWQ quantized version of google/gemma-2b.
Llama Guard 3 8B is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama Guard 3 8B INT8 is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 405B FP8 is een open-source taalmodel van Meta met 406B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 405B Instruct FP8 is een open-source taalmodel van Meta met 406B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 8B Instruct is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. For model details, please visit DeepSeek-V2 page for more information.
shieldgemma 9b is een open-source taalmodel van Google met 9.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
shieldgemma 2b is een open-source taalmodel van Google met 2.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 405B Instruct is een open-source taalmodel van Meta met 406B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 70B Instruct is een open-source taalmodel van Meta met 71B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2 2b it is een open-source taalmodel van Google met 2.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2 2b is een open-source taalmodel van Google met 2.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 405B is een open-source taalmodel van Meta met 406B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 70B is een open-source taalmodel van Meta met 71B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 8B is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench. I
gemma 2 9b is een open-source taalmodel van Google met 9.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2 9b it is een open-source taalmodel van Google met 9.2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2 27b is een open-source taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2 27b it is een open-source taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
This Hub repository contains a HuggingFace's transformers implementation of Florence-2 model from Microsoft.
This Hub repository contains a HuggingFace's transformers implementation of Florence-2 model from Microsoft.
This Hub repository contains a HuggingFace's transformers implementation of Florence-2 model from Microsoft.
This is a continued pretrained version of Florence-2-large model with 4k context length, only 0.1B samples are used for continue pretraining, thus it might not be trained well. In addition, OCR task has been updated with line separator ('\n'). COCO OD AP 39.8
In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here.
In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here.
In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here.
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 7B Qwen2 model.
2024/08/12, 本仓库代码已更新并使用 transformers=4.44.0, 请及时更新依赖。
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 1.5B Qwen2 model.
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the 0.5B Qwen2 base language model.
🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
Last week, the release and buzz around DeepSeek-V2 have ignited widespread interest in MLA (Multi-head Latent Attention)! Many in the community suggested open-sourcing a smaller MoE model for in-depth research. And now DeepSeek-V2-Lite comes out:
Last week, the release and buzz around DeepSeek-V2 have ignited widespread interest in MLA (Multi-head Latent Attention)! Many in the community suggested open-sourcing a smaller MoE model for in-depth research. And now DeepSeek-V2-Lite comes out:
paligemma 3b ft cococap 448 is een multimodaal taalmodel van Google met 2.9B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma 3b pt 448 is een multimodaal taalmodel van Google met 2.9B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma 3b mix 224 is een multimodaal taalmodel van Google met 2.9B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
paligemma 3b pt 224 is een multimodaal taalmodel van Google met 2.9B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.
🎉Phi-4: [multimodal-instruct | onnx]; [mini-instruct | onnx]
🎉 Phi-3.5: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.
Meta Llama Guard 2 8B is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Meta Llama 3 8B is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Meta Llama 3 8B Instruct is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Meta Llama 3 70B Instruct is een open-source taalmodel van Meta met 71B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Meta Llama 3 70B is een open-source taalmodel van Meta met 71B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 1.1 2b it is een open-source taalmodel van Google met 2.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 1.1 7b it is een open-source taalmodel van Google met 8.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
codegemma 7b is een open-source taalmodel van Google met 8.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
codegemma 2b is een open-source taalmodel van Google met 2.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 34b Instruct hf is een open-source taalmodel van Meta met 34B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 70b Instruct hf is een open-source taalmodel van Meta met 69B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 70b hf is een open-source taalmodel van Meta met 69B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 13b Instruct hf is een open-source taalmodel van Meta met 13B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 13b hf is een open-source taalmodel van Meta met 13B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 7b Instruct hf is een open-source taalmodel van Meta met 6.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 7b Python hf is een open-source taalmodel van Meta met 6.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeLlama 7b hf is een open-source taalmodel van Meta met 7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 7b it is een open-source taalmodel van Google met 8.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 7b is een open-source taalmodel van Google met 8.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2b it is een open-source taalmodel van Google met 2.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 2b is een open-source taalmodel van Google met 2.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
❗❗❗ Please use chain-of-thought prompt to test DeepSeekMath-Instruct and DeepSeekMath-RL:
Deepseek-Coder-7B-Instruct-v1.5 is continue pre-trained from Deepseek-LLM 7B on 2T tokens by employing a window size of 4K and next token prediction objective, and then fine-tuned on 2B tokens of instruction data.
python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
modelname = "deepseek-ai/deepseek-moe-16b-base" tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.frompretrained(modelname, torchdtype=torch.bfloat16, devicemap="auto") model.generationconfig = GenerationConfig.frompretrained(modelname) model.generationconfig.padtokenid = model.generationconfig.eostokenid
Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.
py from mistralcommon.tokens.tokenizers.mistral import MistralTokenizer from mistralcommon.protocol.instruct.messages import UserMessage from mistralcommon.protocol.instruct.request import ChatCompletionRequest
LlamaGuard 7b is een open-source taalmodel van Meta met 6.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.
Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.
Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.
- Repository: https://github.com/thu-coai/BPO - Paper: https://arxiv.org/abs/2311.04155 - Data: https://huggingface.co/datasets/THUDM/BPO
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various
py from mistralcommon.tokens.tokenizers.mistral import MistralTokenizer from mistralcommon.protocol.instruct.messages import UserMessage from mistralcommon.protocol.instruct.request import ChatCompletionRequest
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.
The language model Phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from The Stack v1.2, Q&A content from StackOverflow, competition code from codecontests, and synthetic Python textbooks and exercises generated by gpt-3.5-turbo-0301. Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), Phi-1 has demonstrated an impressive accuracy rate exceeding 50% on the simple Python coding benchmark, HumanEval.
The language model Phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.
Llama 2 70b chat hf is een open-source taalmodel van Meta met 69B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 2 7b chat hf is een open-source taalmodel van Meta met 6.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 2 7b hf is een open-source taalmodel van Meta met 6.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 2 13b hf is een open-source taalmodel van Meta met 13B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 2 13b chat hf is een open-source taalmodel van Meta met 13B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 2 70b hf is een open-source taalmodel van Meta met 69B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeGPT small java is een open-source taalmodel van Microsoft, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeGPT small java adaptedGPT2 is een open-source taalmodel van Microsoft, gehost op Europese GPU's via een OpenAI-compatibele API.
CodeGPT small py is een open-source taalmodel van Microsoft, gehost op Europese GPU's via een OpenAI-compatibele API.
DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread.
DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread.
DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread.
deepseek coder 6.7b is een open-source taalmodel van DeepSeek met 6.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
deepseek coder 7b instruct v1.5 is een open-source taalmodel van DeepSeek met 7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek Coder V2 Lite (16B) is een open-source taalmodel van DeepSeek met 16B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek R1 0528 Qwen3 8B is een open-source taalmodel van DeepSeek met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek R1 Distill 1.5B is een open-source taalmodel van DeepSeek met 1.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek R1 Distill 14B is een open-source taalmodel van DeepSeek met 14B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek R1 Distill 32B is een open-source taalmodel van DeepSeek met 32B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek R1 Distill Llama 70B is een open-source taalmodel van DeepSeek met 70B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek R1 Distill Llama 8B is een open-source taalmodel van DeepSeek met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
DeepSeek V3 (685B MoE) is een open-source taalmodel van DeepSeek met 685B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
functiongemma 270m is een open-source taalmodel van Google, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3 12b is een open-source taalmodel van Google met 12B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Gemma 3 1B is een open-source taalmodel van Google met 1B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Gemma 3 27B is een open-source taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Gemma 3 4B is een open-source taalmodel van Google met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
gemma 3n E2B is een open-source taalmodel van Google met 2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 70B is een open-source taalmodel van Meta met 70B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.1 8B is een open-source taalmodel van Meta met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 11B Vision is een open-source taalmodel van Meta met 11B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 1B is een open-source taalmodel van Meta met 1B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.2 3B is een open-source taalmodel van Meta met 3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 3.3 70B is een open-source taalmodel van Meta met 70B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 4 Maverick (17Bx128E) is een open-source taalmodel van Meta met 17B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Llama 4 Scout (17Bx16E) is een open-source taalmodel van Meta met 17B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Sovereign EU model fine-tuned by HostYourAI on dutch-clean.
Sovereign EU model fine-tuned by HostYourAI on loes-xl-pre.
Sovereign EU model fine-tuned by HostYourAI on loes-xl-pre.
medgemma 27b is een open-source taalmodel van Google met 27B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
medgemma 4b is een open-source taalmodel van Google met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Mistral Nemo 12B is een open-source taalmodel van Mistral met 12B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Phi 4 (14B) is een open-source taalmodel van Microsoft met 14B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Phi 4 Mini (3.8B) is een open-source taalmodel van Microsoft met 3.8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 2.5 14B is een open-source taalmodel van Qwen met 14B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 2.5 72B is een open-source taalmodel van Qwen met 72B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 2.5 Coder 1.5B is een open-source taalmodel van Qwen met 1.5B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 2.5 Coder 32B is een open-source taalmodel van Qwen met 32B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 2.5 Coder 7B is een open-source taalmodel van Qwen met 7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 3 0.6B is een open-source taalmodel van Qwen met 0.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 3 4B is een open-source taalmodel van Qwen met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen 3 Coder 30B-A3B (MoE) is een open-source taalmodel van Qwen met 30B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 32B is een open-source taalmodel van Qwen met 32B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 7B is een open-source taalmodel van Qwen met 7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 Coder 14B is een open-source taalmodel van Qwen met 14B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 VL 32B is een open-source taalmodel van Qwen met 32B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 VL 3B is een open-source taalmodel van Qwen met 3B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 VL 72B is een open-source taalmodel van Qwen met 72B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen2.5 VL 7B is een open-source taalmodel van Qwen met 7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 1.7B is een open-source taalmodel van Qwen met 1.7B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 14B is een open-source taalmodel van Qwen met 14B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 30B A3B is een open-source taalmodel van Qwen met 30B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 30B A3B Instruct 2507 is een open-source taalmodel van Qwen met 30B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 32B is een open-source taalmodel van Qwen met 32B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 4B Instruct 2507 is een open-source taalmodel van Qwen met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 4B Thinking 2507 is een open-source taalmodel van Qwen met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 8B is een open-source taalmodel van Qwen met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 Coder Next is een open-source taalmodel van Qwen, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 VL 2B is een open-source taalmodel van Qwen met 2B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 VL 30B A3B is een open-source taalmodel van Qwen met 30B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 VL 32B is een open-source taalmodel van Qwen met 32B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 VL 4B is een open-source taalmodel van Qwen met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 VL 8B is een open-source taalmodel van Qwen met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3 VL 8B Thinking is een open-source taalmodel van Qwen met 8B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Qwen3Guard Gen 0.6B is een open-source taalmodel van Qwen met 0.6B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
translategemma 4b is een open-source taalmodel van Google met 4B parameters, gehost op Europese GPU's via een OpenAI-compatibele API.
Geen creditcard nodig. Betaal naar gebruik, stop wanneer je wilt.
Begin vandaag gratis met hosten