Deep Seek V2 Chat
deepseek-aiIntroduction
DeepSeek-V2 is an advanced Mixture-of-Experts (MoE) language model designed for economical training and efficient inference, featuring 236 billion parameters with 21 billion activated per token. It improves upon its predecessor, DeepSeek 67B, by reducing training costs by 42.5%, decreasing KV cache usage by 93.3%, and increasing generation throughput by 5.76 times. The model is pretrained on 8.1 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).
Architecture
DeepSeek-V2 incorporates innovative architectures for performance optimization:
- Multi-head Latent Attention (MLA): Utilizes low-rank key-value union compression to improve inference efficiency by reducing the key-value cache bottleneck.
- DeepSeekMoE Architecture: A high-performance Mixture-of-Experts setup that allows for training stronger models at reduced costs.
Training
The model underwent extensive pretraining on a high-quality corpus, followed by SFT and RL, to maximize its capabilities. It exhibits superior performance on standard benchmarks and excels in open-ended generation evaluations.
Guide: Running Locally
To run DeepSeek-V2 locally, follow these steps:
- Hardware Requirements: Utilize 80GB*8 GPUs for inference in BF16 format.
- Using Hugging Face Transformers:
- Install the
transformers
library. - Load the model using
AutoTokenizer
andAutoModelForCausalLM
. - Set
max_memory
based on your GPU configuration. - For text or chat completion, prepare input data and generate outputs using the model.
- Install the
- Using vLLM (Recommended):
- Integrate vLLM with the necessary Pull Request for enhanced performance.
- Load the model using
LLM
from vLLM for efficient inference.
Cloud GPUs: Consider using cloud services like AWS, Google Cloud, or Azure for accessing powerful GPUs if local resources are insufficient.
License
The code is licensed under the MIT License. Usage of the DeepSeek-V2 Base/Chat models is governed by a specific Model License. Both licenses support commercial use.