Deep Seek V3 G G U F

bullerwins

DeepSeek-V3-GGUF

Introduction

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model featuring 671 billion total parameters, with 37 billion activated per token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures to ensure efficient inference and cost-effective training. DeepSeek-V3 is trained on 14.8 trillion tokens, followed by fine-tuning and reinforcement learning. The model demonstrates superior performance compared to other open-source models, requiring only 2.788 million H800 GPU hours for training.

Architecture

DeepSeek-V3 introduces an auxiliary-loss-free strategy for load balancing and a Multi-Token Prediction (MTP) training objective. The model is designed for efficient inference and training, utilizing FP8 mixed precision and overcoming communication bottlenecks in cross-node MoE training, significantly enhancing training efficiency.

Training

The model underwent pre-training with FP8 mixed precision for cost-effectiveness, requiring 2.664 million GPU hours. Post-training involves knowledge distillation from DeepSeek-R1 series models to improve reasoning capabilities and maintain output control.

Guide: Running Locally

To run DeepSeek-V3 locally, use the following steps:

  1. Clone the Repository:

    git clone https://github.com/deepseek-ai/DeepSeek-V3.git
    
  2. Install Dependencies: Navigate to the inference folder and install the requirements:

    cd DeepSeek-V3/inference
    pip install -r requirements.txt
    
  3. Download and Convert Model Weights:

    python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
    
  4. Run Inference:

    torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
    

Cloud GPUs

Utilize cloud GPUs for enhanced performance, including NVIDIA and AMD GPUs, which SGLang and vLLM support for FP8 and BF16 modes.

License

The code is licensed under the MIT License. The DeepSeek-V3 models are subject to a Model License, supporting commercial use.

More Related APIs