DeepSeek-V3

Introduction

DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model with a total of 671 billion parameters and 37 billion activated for each token. It incorporates Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, developed for efficient inference and cost-effective training. DeepSeek-V3 is trained on 14.8 trillion tokens and employs auxiliary-loss-free load balancing and a multi-token prediction training strategy. It excels in performance compared to other open-source models and is comparable to leading closed-source models, requiring 2.788 million H800 GPU hours for its full training.

Architecture

DeepSeek-V3 builds on DeepSeek-V2's architecture with innovations such as an auxiliary-loss-free load balancing strategy and a Multi-Token Prediction (MTP) objective. It uses FP8 mixed precision training and achieves efficient training through computation-communication overlap, enabling large-scale model training without additional overhead. Post-training involves knowledge distillation from DeepSeek-R1 models to enhance reasoning capabilities.

Training

DeepSeek-V3's pre-training utilizes an FP8 mixed precision framework validated on a large-scale model, reducing training costs to 2.664 million H800 GPU hours. The model is trained on 14.8 trillion diverse tokens, followed by fine-tuning and reinforcement learning. This results in a stable training process without irrecoverable loss spikes.

Model Stats Number

Total Parameters: 671 billion
Activated Parameters: 37 billion
Context Length: 128K

Guide: Running Locally

DeepSeek-V3 can be deployed locally using open-source software and specific hardware:

DeepSeek-Infer Demo: Provides a lightweight demo for FP8 and BF16 inference.
SGLang: Supports DeepSeek-V3 in FP8 and BF16 inference modes.
LMDeploy: Offers efficient FP8 and BF16 inference for local and cloud deployment.
TensorRT-LLM: Supports BF16 inference and future FP8 support.
vLLM: Enables FP8 and BF16 modes for tensor parallelism.
AMD GPU: Run DeepSeek-V3 on AMD GPUs via SGLang.
Huawei Ascend NPU: Supports DeepSeek-V3 on Huawei Ascend devices.

To convert FP8 weights to BF16, use the following command:

cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

Recommended Cloud GPUs: NVIDIA and AMD GPUs for optimal performance.

License

The code repository is licensed under the MIT License. The use of DeepSeek-V3 models is subject to the Model License, supporting commercial use.

More Related APIs