deepseek coder 6.7b instruct

deepseek-ai

Introduction

DeepSeek Coder consists of a series of code language models trained from scratch on 2 trillion tokens, which include 87% code and 13% natural language data in English and Chinese. These models support project-level code completion and infilling, achieving state-of-the-art performance on several programming benchmarks.

Architecture

The models are available in various sizes, ranging from 1B to 33B parameters. The architecture employs a window size of 16K and includes a fill-in-the-blank task to enhance code completion capabilities. The 6.7B parameter model, deepseek-coder-6.7b-instruct, is fine-tuned on 2B tokens of instruction data.

Training

DeepSeek Coder models are trained from scratch on a large dataset of code and linguistic data. The training process includes a significant portion of both English and Chinese languages, optimizing the models for diverse coding environments and benchmarks such as HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.

Guide: Running Locally

To run the deepseek-coder-6.7b-instruct model locally, follow these steps:

  1. Install the transformers library:

    pip install transformers
    
  2. Import the necessary libraries and load the model:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    
  3. Prepare your input and generate output:

    messages = [{'role': 'user', 'content': "write a quick sort algorithm in python."}]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
    print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
    

For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The code repository is licensed under the MIT License, while the models themselves are subject to the Model License, allowing for commercial use. For more details, refer to the LICENSE-MODEL.

More Related APIs in Text Generation