Deep Seek Coder V2 Instruct
deepseek-aiIntroduction
DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model designed to perform comparably to GPT4-Turbo in code-specific tasks. It is pre-trained on an additional 6 trillion tokens, enhancing its coding and mathematical reasoning capabilities while maintaining general language task performance. The model extends its programming language support from 86 to 338 and increases context length from 16K to 128K.
Architecture
DeepSeek-Coder-V2 is based on the DeepSeekMoE framework and is available in versions with 16B and 236B parameters. The model utilizes active parameters of 2.4B and 21B, respectively. It is designed to handle a context length of up to 128K tokens. The model advancements allow for superior performance in coding and mathematical reasoning compared to other closed-source models.
Training
DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 using an additional 6 trillion tokens. This extensive pre-training enhances its ability to handle a wide range of programming languages and improves its reasoning and general capabilities in code-related tasks.
Guide: Running Locally
To run DeepSeek-Coder-V2 locally, it is recommended to use cloud GPUs such as 80GB*8 for BF16 format inference.
Inference with Hugging Face's Transformers
- Install the Transformers library from Hugging Face.
- Load the tokenizer and model using
AutoTokenizer
andAutoModelForCausalLM
. - Use the model for tasks like code completion, code insertion, and chat completion.
Example for code completion:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference with vLLM (Recommended)
- Integrate the vLLM library into your codebase.
- Load the model and tokenizer using
vLLM
andAutoTokenizer
. - Use the model for generating responses to user queries.
License
The code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to a specific Model License. The DeepSeek-Coder-V2 series supports commercial use.