Deep Seek Coder V2 Lite Instruct

deepseek-ai

Introduction

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that delivers performance comparable to GPT4-Turbo in code-specific tasks. It is built upon DeepSeek-V2 with an additional pre-training of 6 trillion tokens, enhancing its coding and mathematical reasoning abilities. DeepSeek-Coder-V2 supports 338 programming languages and extends context length from 16K to 128K.

Architecture

DeepSeek-Coder-V2 is available with two configurations: models with 16B and 236B parameters, utilizing the DeepSeekMoE framework. Active parameters are 2.4B for the 16B model and 21B for the 236B model. These models are made available as base and instruct versions.

Training

The model is pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. This extensive pre-training allows DeepSeek-Coder-V2 to perform well in coding and mathematical reasoning tasks while maintaining comparable performance in general language tasks.

Guide: Running Locally

To run DeepSeek-Coder-V2 locally, follow these steps:

  1. Hardware Requirements: Utilize 80GB*8 GPUs for inference in BF16 format.
  2. Installation: Use Hugging Face's Transformers library.
  3. Code Completion:
    • Import necessary libraries and load the model.
    • Provide input text and execute the model to generate code.
  4. Code Insertion and Chat Completion:
    • Follow similar steps as code completion for insertion and chat-based tasks.
  5. vLLM Inference (Recommended):
    • Merge the specified pull request into your vLLM codebase for model inference.

Cloud GPUs from providers like AWS, Google Cloud, or Azure are recommended for running large models efficiently.

License

The code repository is licensed under the MIT License. The DeepSeek-Coder-V2 models are subject to a separate Model License, allowing for commercial use. Models include both Base and Instruct versions.

More Related APIs in Text Generation