Deep Seek Coder V2 Base
deepseek-aiIntroduction
DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model designed to perform on par with closed-source models like GPT4-Turbo in code-specific tasks. It is pre-trained from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens, enhancing its capabilities in coding and mathematical reasoning. The model supports a wide range of programming languages, expanding from 86 to 338, and offers a context length of up to 128K.
Architecture
DeepSeek-Coder-V2 is available in configurations with 16B and 236B total parameters, utilizing only 2.4B and 21B active parameters, respectively. This architecture is based on the DeepSeekMoE framework, focusing on efficient computation through the activation of only a subset of parameters during inference.
Training
The model undergoes additional pre-training on a substantial dataset of 6 trillion tokens. This process enhances its proficiency in code and mathematical reasoning tasks while maintaining strong performance in general language processing tasks.
Guide: Running Locally
To run DeepSeek-Coder-V2 locally, you can utilize Hugging Face's Transformers library. For efficient inference, it's recommended to use cloud GPUs, specifically 80GB*8 GPUs for the BF16 format.
Steps for Running Locally
-
Code Completion
from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() input_text = "#write a quick sort algorithm" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-
Inference with vLLM (Recommended)
from transformers import AutoTokenizer from vllm import LLM, SamplingParams max_model_len, tp_size = 8192, 1 model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True) sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id]) messages_list = [{"role": "user", "content": "write a quick sort algorithm in python."}] prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list] outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params) generated_text = [output.outputs[0].text for output in outputs] print(generated_text)
License
The code of DeepSeek-Coder-V2 is licensed under the MIT License. The models, including Base and Instruct versions, are subject to a separate Model License that supports commercial use.