Deep Seek Coder V2 Base

deepseek-ai

Introduction

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model designed to perform on par with closed-source models like GPT4-Turbo in code-specific tasks. It is pre-trained from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens, enhancing its capabilities in coding and mathematical reasoning. The model supports a wide range of programming languages, expanding from 86 to 338, and offers a context length of up to 128K.

Architecture

DeepSeek-Coder-V2 is available in configurations with 16B and 236B total parameters, utilizing only 2.4B and 21B active parameters, respectively. This architecture is based on the DeepSeekMoE framework, focusing on efficient computation through the activation of only a subset of parameters during inference.

Training

The model undergoes additional pre-training on a substantial dataset of 6 trillion tokens. This process enhances its proficiency in code and mathematical reasoning tasks while maintaining strong performance in general language processing tasks.

Guide: Running Locally

To run DeepSeek-Coder-V2 locally, you can utilize Hugging Face's Transformers library. For efficient inference, it's recommended to use cloud GPUs, specifically 80GB*8 GPUs for the BF16 format.

Steps for Running Locally

  1. Code Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    input_text = "#write a quick sort algorithm"
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    
  2. Inference with vLLM (Recommended)

    from transformers import AutoTokenizer
    from vllm import LLM, SamplingParams
    max_model_len, tp_size = 8192, 1
    model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
    sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
    messages_list = [{"role": "user", "content": "write a quick sort algorithm in python."}]
    prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
    outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
    generated_text = [output.outputs[0].text for output in outputs]
    print(generated_text)
    

License

The code of DeepSeek-Coder-V2 is licensed under the MIT License. The models, including Base and Instruct versions, are subject to a separate Model License that supports commercial use.

More Related APIs in Text Generation