Deep Seek V3
deepseek-aiDeepSeek-V3
Introduction
DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model with a total of 671 billion parameters and 37 billion activated for each token. It incorporates Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, developed for efficient inference and cost-effective training. DeepSeek-V3 is trained on 14.8 trillion tokens and employs auxiliary-loss-free load balancing and a multi-token prediction training strategy. It excels in performance compared to other open-source models and is comparable to leading closed-source models, requiring 2.788 million H800 GPU hours for its full training.
Architecture
DeepSeek-V3 builds on DeepSeek-V2's architecture with innovations such as an auxiliary-loss-free load balancing strategy and a Multi-Token Prediction (MTP) objective. It uses FP8 mixed precision training and achieves efficient training through computation-communication overlap, enabling large-scale model training without additional overhead. Post-training involves knowledge distillation from DeepSeek-R1 models to enhance reasoning capabilities.
Training
DeepSeek-V3's pre-training utilizes an FP8 mixed precision framework validated on a large-scale model, reducing training costs to 2.664 million H800 GPU hours. The model is trained on 14.8 trillion diverse tokens, followed by fine-tuning and reinforcement learning. This results in a stable training process without irrecoverable loss spikes.
Model Stats Number
- Total Parameters: 671 billion
- Activated Parameters: 37 billion
- Context Length: 128K
Guide: Running Locally
DeepSeek-V3 can be deployed locally using open-source software and specific hardware:
- DeepSeek-Infer Demo: Provides a lightweight demo for FP8 and BF16 inference.
- SGLang: Supports DeepSeek-V3 in FP8 and BF16 inference modes.
- LMDeploy: Offers efficient FP8 and BF16 inference for local and cloud deployment.
- TensorRT-LLM: Supports BF16 inference and future FP8 support.
- vLLM: Enables FP8 and BF16 modes for tensor parallelism.
- AMD GPU: Run DeepSeek-V3 on AMD GPUs via SGLang.
- Huawei Ascend NPU: Supports DeepSeek-V3 on Huawei Ascend devices.
To convert FP8 weights to BF16, use the following command:
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
Recommended Cloud GPUs: NVIDIA and AMD GPUs for optimal performance.
License
The code repository is licensed under the MIT License. The use of DeepSeek-V3 models is subject to the Model License, supporting commercial use.