Deep Seek V2 Chat

deepseek-ai

Introduction

DeepSeek-V2 is an advanced Mixture-of-Experts (MoE) language model designed for economical training and efficient inference, featuring 236 billion parameters with 21 billion activated per token. It improves upon its predecessor, DeepSeek 67B, by reducing training costs by 42.5%, decreasing KV cache usage by 93.3%, and increasing generation throughput by 5.76 times. The model is pretrained on 8.1 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).

Architecture

DeepSeek-V2 incorporates innovative architectures for performance optimization:

  • Multi-head Latent Attention (MLA): Utilizes low-rank key-value union compression to improve inference efficiency by reducing the key-value cache bottleneck.
  • DeepSeekMoE Architecture: A high-performance Mixture-of-Experts setup that allows for training stronger models at reduced costs.

Training

The model underwent extensive pretraining on a high-quality corpus, followed by SFT and RL, to maximize its capabilities. It exhibits superior performance on standard benchmarks and excels in open-ended generation evaluations.

Guide: Running Locally

To run DeepSeek-V2 locally, follow these steps:

  1. Hardware Requirements: Utilize 80GB*8 GPUs for inference in BF16 format.
  2. Using Hugging Face Transformers:
    • Install the transformers library.
    • Load the model using AutoTokenizer and AutoModelForCausalLM.
    • Set max_memory based on your GPU configuration.
    • For text or chat completion, prepare input data and generate outputs using the model.
  3. Using vLLM (Recommended):
    • Integrate vLLM with the necessary Pull Request for enhanced performance.
    • Load the model using LLM from vLLM for efficient inference.

Cloud GPUs: Consider using cloud services like AWS, Google Cloud, or Azure for accessing powerful GPUs if local resources are insufficient.

License

The code is licensed under the MIT License. Usage of the DeepSeek-V2 Base/Chat models is governed by a specific Model License. Both licenses support commercial use.

More Related APIs in Text Generation