deepseek coder 6.7b instruct
deepseek-aiIntroduction
DeepSeek Coder consists of a series of code language models trained from scratch on 2 trillion tokens, which include 87% code and 13% natural language data in English and Chinese. These models support project-level code completion and infilling, achieving state-of-the-art performance on several programming benchmarks.
Architecture
The models are available in various sizes, ranging from 1B to 33B parameters. The architecture employs a window size of 16K and includes a fill-in-the-blank task to enhance code completion capabilities. The 6.7B parameter model, deepseek-coder-6.7b-instruct
, is fine-tuned on 2B tokens of instruction data.
Training
DeepSeek Coder models are trained from scratch on a large dataset of code and linguistic data. The training process includes a significant portion of both English and Chinese languages, optimizing the models for diverse coding environments and benchmarks such as HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.
Guide: Running Locally
To run the deepseek-coder-6.7b-instruct
model locally, follow these steps:
-
Install the
transformers
library:pip install transformers
-
Import the necessary libraries and load the model:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
-
Prepare your input and generate output:
messages = [{'role': 'user', 'content': "write a quick sort algorithm in python."}] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id) print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The code repository is licensed under the MIT License, while the models themselves are subject to the Model License, allowing for commercial use. For more details, refer to the LICENSE-MODEL.