Deep Seek Coder V2 Lite Base
deepseek-aiIntroduction
DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that rivals closed-source models like GPT4-Turbo in code-specific tasks. It builds on DeepSeek-V2 with additional pre-training on 6 trillion tokens, enhancing its coding and mathematical reasoning capabilities while supporting a wide range of programming languages (increased from 86 to 338) and extending the context length from 16K to 128K.
Architecture
DeepSeek-Coder-V2 is available with 16B and 236B parameters, based on the DeepSeekMoE framework, with active parameters at 2.4B and 21B respectively. It includes both base and instruct models.
Training
The continued pre-training from an intermediate checkpoint of DeepSeek-V2 allows DeepSeek-Coder-V2 to significantly advance in code-related tasks, reasoning, and general language tasks.
Guide: Running Locally
To run DeepSeek-Coder-V2 locally, follow these steps:
- Prerequisites: Ensure you have access to 80GB*8 GPUs for inference in BF16 format.
- Inference with Transformers:
- Install Hugging Face's Transformers library.
- Load and prepare the model and tokenizer.
- Example Tasks:
- Code Completion: Use the model to generate code snippets like a quicksort algorithm.
- Code Insertion: Fill in missing parts of code.
- Chat Completion: Implement conversation tasks using the provided chat template.
- Inference with vLLM (Recommended):
- Integrate the provided Pull Request into your vLLM codebase.
- Use vLLM for model inference with the specified parameters.
For cloud-based execution, consider utilizing cloud GPUs from providers like AWS, Google Cloud, or Azure for optimal performance.
License
The code repository is licensed under the MIT License. The DeepSeek-Coder-V2 Base/Instruct models are subject to a separate Model License, which allows for commercial use.