Deep Seek V3
unslothIntroduction
DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671 billion parameters, designed for efficient inference and training. It incorporates Multi-head Latent Attention (MLA) and DeepSeekMoE architectures to enhance performance, validated in its predecessor, DeepSeek-V2. The model outperforms many open-source models and competes with top closed-source models. It was pre-trained on 14.8 trillion tokens and requires 2.788 million H800 GPU hours for training.
Architecture
DeepSeek-V3 introduces an auxiliary-loss-free strategy for load balancing, minimizing performance degradation. It employs a Multi-Token Prediction (MTP) objective to boost performance and facilitate speculative decoding for faster inference. The model uses FP8 mixed precision training, validated on a large scale, to improve training efficiency and reduce costs.
Training
DeepSeek-V3's pre-training involves FP8 mixed precision, achieving efficient communication in cross-node MoE training. It completes pre-training on 14.8 trillion tokens with 2.664 million H800 GPU hours. Post-training involves distilling reasoning capabilities from DeepSeek-R1 into DeepSeek-V3, enhancing its reasoning and output control.
Guide: Running Locally
DeepSeek-V3 can be run locally using various hardware and software:
- DeepSeek-Infer Demo: Lightweight setup for FP8 and BF16 inference.
- SGLang: Supports FP8 and BF16 inference on NVIDIA and AMD GPUs.
- LMDeploy: Provides offline and online deployment options.
- TensorRT-LLM: Supports BF16 and INT4/8 quantizations; FP8 support is coming.
- vLLM: Offers pipeline parallelism for distributed runs.
- AMD GPU and Huawei Ascend NPU: Provide support for FP8 and BF16 modes.
To run locally, clone the DeepSeek-V3 repository, install dependencies, and follow the conversion and inference steps provided in the respective framework documentation.
Suggested Cloud GPUs
- Google Colab with Tesla T4
- NVIDIA GPUs for TensorRT-LLM
- AMD GPUs for SGLang
- Huawei Ascend for MindIE
License
DeepSeek-V3 is licensed under the MIT License for code. The model's Base and Chat versions support commercial use under the Model License.