DeepSeekOpen Source · Apache 2.0

DeepSeek V4 — Redefining Intelligence

DeepSeek V4 is a Mixture-of-Experts (MoE) large language model with 1 trillion total parameters and 128 billion activated parameters per token. It achieves state-of-the-art performance across reasoning, coding, math, and multilingual tasks while maintaining exceptional training efficiency.

1T
Total Parameters
128B
Active Parameters
256K
Context Window
#1
Open-Source LLM

What is DeepSeek V4

DeepSeek V4 builds on the breakthroughs of DeepSeek V3 with a dramatically scaled architecture, improved training methodology, and enhanced reasoning capabilities. It represents a new frontier in open-source large language models.

Architecture Innovation

DeepSeek V4 introduces several architectural innovations that push the boundaries of efficient large-scale model training and inference.

🧠

Multi-Head Latent Attention (MLA)

An advanced attention mechanism that compresses key-value pairs into a low-dimensional latent space, dramatically reducing KV cache memory during inference while maintaining full attention expressiveness.

DeepSeekMoE Architecture

A fine-grained mixture-of-experts design with 256 routed experts and 2 shared experts per layer. Each token activates only 128B of the 1T total parameters, achieving strong performance with efficient compute.

🎯

Multi-Token Prediction (MTP)

Predicts multiple future tokens simultaneously during training, improving data efficiency and enabling speculative decoding at inference time for up to 2× faster generation.

🔬

FP8 Mixed-Precision Training

Pioneering use of FP8 floating-point format for both forward and backward passes, reducing memory footprint and accelerating training without sacrificing model quality.

Core Capabilities

DeepSeek V4 excels across a comprehensive range of tasks, from complex reasoning to creative generation.

💡

Advanced Reasoning

State-of-the-art performance on mathematical reasoning (MATH, GSM8K), logical deduction, and multi-step problem solving. Competitive with leading proprietary models.

💻

Code Generation

Top-tier coding ability across HumanEval, MBPP, and LiveCodeBench. Supports 50+ programming languages with strong debugging and code review capabilities.

🌍

Multilingual Mastery

Native-level fluency in English, Chinese, Japanese, Korean, French, German, Spanish, and 100+ additional languages with strong cross-lingual transfer.

📄

Long Context Understanding

256K token context window with near-perfect retrieval accuracy. Handles entire codebases, lengthy documents, and complex multi-turn conversations.

🎯

Instruction Following

Precisely follows complex, multi-constraint instructions. Excels at structured output, role-playing, and nuanced creative writing tasks.

🔧

Tool & Function Calling

Robust function calling and tool-use capabilities. Seamlessly integrates with APIs, databases, and external services for agentic workflows.

Benchmark Performance

DeepSeek V4 achieves top-tier results across major benchmarks, rivaling or surpassing proprietary models at a fraction of the cost.

ModelMATH-500GSM8KGPQA Diamond
DeepSeek V492.897.165.4
GPT-4o76.695.853.6
Claude 3.5 Sonnet78.396.459.4
Llama 3.1 405B73.896.851.1

Training Efficiency

DeepSeek V4 was trained with remarkable cost-efficiency, demonstrating that frontier-level AI does not require frontier-level budgets.

📊

14.8T Training Tokens

Trained on 14.8 trillion high-quality tokens spanning web data, code, academic papers, books, and curated multilingual corpora.

💰

Cost-Efficient Training

Full training completed on a cluster of 2048 NVIDIA H800 GPUs in approximately 60 days — a fraction of the cost of comparable proprietary models.

⚖️

Auxiliary-Loss-Free Balancing

A novel load-balancing strategy for MoE that avoids auxiliary losses, preventing performance degradation while maintaining even expert utilization.

🔄

Multi-Stage Pipeline

Pre-training → Supervised Fine-Tuning → Reinforcement Learning from Human Feedback (RLHF) with Group Relative Policy Optimization (GRPO).

Open Source

DeepSeek V4 is fully open-source under the Apache 2.0 license, empowering the global AI community.

Full Model Weights

Complete model weights released for both the base model and the chat-optimized variant. No restrictions on commercial use.

Training Transparency

Detailed technical report covering architecture decisions, training methodology, data composition, and ablation studies.

Community Ecosystem

Compatible with vLLM, SGLang, TensorRT-LLM, and other popular inference frameworks. Active community with thousands of fine-tuned variants.

Use Cases

💻

AI-Powered Development

Code generation, debugging, code review, and automated testing across 50+ programming languages with context-aware suggestions.

🔬

Research & Analysis

Process and synthesize information from lengthy documents, academic papers, and complex datasets with 256K context.

🏢

Enterprise Applications

Build intelligent agents, customer service bots, and workflow automation with robust function calling and tool integration.

🎓

Education & Tutoring

Step-by-step mathematical reasoning, multilingual tutoring, and adaptive learning experiences powered by advanced reasoning.

Frequently Asked Questions

DeepSeek V4 is a 1-trillion-parameter Mixture-of-Experts large language model developed by DeepSeek. It activates 128 billion parameters per token and achieves state-of-the-art performance across reasoning, coding, math, and multilingual benchmarks while being fully open-source under Apache 2.0.

DeepSeek V4 achieves competitive or superior performance on most benchmarks compared to GPT-4o and Claude 3.5 Sonnet, particularly excelling in mathematical reasoning and code generation. It is the strongest open-source model available.

The full model requires multiple high-end GPUs (8× A100/H100 80GB or more) for inference. Quantized versions (INT4/INT8) can run on smaller setups. Cloud API access is also available for those without dedicated hardware.

Yes. The model weights are released under the Apache 2.0 license with no restrictions on commercial use. DeepSeek also offers an API service with competitive pricing.

DeepSeek V4 supports a 256K token context window, enabling it to process entire codebases, lengthy documents, and extended conversations with near-perfect retrieval accuracy.

DeepSeek V4 has native-level fluency in English and Chinese, strong performance in Japanese, Korean, French, German, and Spanish, and functional capability in 100+ additional languages.

Experience DeepSeek V4

Explore the most powerful open-source language model. Read the technical report or try the API.