Deep Seek Coder V2 Lite Base

deepseek-ai

Introduction

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that rivals closed-source models like GPT4-Turbo in code-specific tasks. It builds on DeepSeek-V2 with additional pre-training on 6 trillion tokens, enhancing its coding and mathematical reasoning capabilities while supporting a wide range of programming languages (increased from 86 to 338) and extending the context length from 16K to 128K.

Architecture

DeepSeek-Coder-V2 is available with 16B and 236B parameters, based on the DeepSeekMoE framework, with active parameters at 2.4B and 21B respectively. It includes both base and instruct models.

Training

The continued pre-training from an intermediate checkpoint of DeepSeek-V2 allows DeepSeek-Coder-V2 to significantly advance in code-related tasks, reasoning, and general language tasks.

Guide: Running Locally

To run DeepSeek-Coder-V2 locally, follow these steps:

  1. Prerequisites: Ensure you have access to 80GB*8 GPUs for inference in BF16 format.
  2. Inference with Transformers:
    • Install Hugging Face's Transformers library.
    • Load and prepare the model and tokenizer.
  3. Example Tasks:
    • Code Completion: Use the model to generate code snippets like a quicksort algorithm.
    • Code Insertion: Fill in missing parts of code.
    • Chat Completion: Implement conversation tasks using the provided chat template.
  4. Inference with vLLM (Recommended):
    • Integrate the provided Pull Request into your vLLM codebase.
    • Use vLLM for model inference with the specified parameters.

For cloud-based execution, consider utilizing cloud GPUs from providers like AWS, Google Cloud, or Azure for optimal performance.

License

The code repository is licensed under the MIT License. The DeepSeek-Coder-V2 Base/Instruct models are subject to a separate Model License, which allows for commercial use.

More Related APIs in Text Generation