Deep Seek V3 Base

deepseek-ai

DeepSeek-V3 Documentation

Introduction

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671 billion total parameters, 37 billion of which are activated per token. It incorporates Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture for efficient inference and cost-effective training. The model employs an auxiliary-loss-free strategy for load balancing and a multi-token prediction objective. It is pre-trained on 14.8 trillion tokens and further refined through supervised fine-tuning and reinforcement learning, achieving performance comparable to leading closed-source models.

Architecture

DeepSeek-V3 builds on the efficient architecture of DeepSeek-V2. It introduces an auxiliary-loss-free strategy for load balancing, minimizing performance degradation. The Multi-Token Prediction (MTP) objective enhances model performance and supports speculative decoding for faster inference.

Training

The model uses an FP8 mixed precision training framework, demonstrating the feasibility of FP8 training on large-scale models. By optimizing algorithms, frameworks, and hardware, DeepSeek-V3 achieves efficient cross-node MoE training with nearly full computation-communication overlap. The pre-training phase consumes 2.664 million H800 GPU hours, with additional training requiring only 0.1 million GPU hours. Knowledge distillation techniques are applied to improve reasoning performance, integrating elements from DeepSeek R1 series models.

Model Stats Number

  • Total Parameters: 671 billion
  • Activated Parameters per Token: 37 billion
  • Context Length: 128K

Guide: Running Locally

DeepSeek-V3 can be deployed using various hardware and software solutions:

  1. DeepSeek-Infer Demo: Provides FP8 and BF16 inference.
  2. SGLang: Supports FP8 and BF16 modes, offering optimal latency and throughput.
  3. LMDeploy: Facilitates efficient local and cloud deployment.
  4. TensorRT-LLM: Supports BF16 and INT4/8 quantization.
  5. vLLM: Enables tensor parallelism and pipeline parallelism.

Steps to Run Locally

  1. Clone the Repository:

    git clone https://github.com/deepseek-ai/DeepSeek-V3.git
    cd DeepSeek-V3/inference
    
  2. Install Dependencies:

    pip install -r requirements.txt
    
  3. Download Model Weights: Place weights in the /path/to/DeepSeek-V3 folder.

  4. Convert Model Weights:

    python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
    
  5. Run the Model:

    torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
    

Cloud GPU Suggestion

For optimal performance, using NVIDIA or AMD GPUs in cloud environments is recommended.

License

The code repository is available under the MIT License. Usage of DeepSeek-V3 models complies with the Model License, allowing commercial use.

More Related APIs