deepseek coder 1.3b instruct
deepseek-aiIntroduction
Deepseek Coder is a collection of code language models, trained from scratch on 2 trillion tokens, with 87% dedicated to code and 13% to natural language in both English and Chinese. The models range from 1B to 33B parameters and are designed for project-level code completion and infilling, achieving high performance on various programming languages and benchmarks.
Architecture
- Training Data: 2T tokens composed of 87% code and 13% natural language.
- Model Sizes: Available in 1.3B, 5.7B, 6.7B, and 33B parameters.
- Performance: State-of-the-art results on benchmarks such as HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.
- Features: Supports project-level code completion with a 16K window size and fill-in-the-blank tasks.
Training
The deepseek-coder-1.3b-instruct model features 1.3 billion parameters, initialized from deepseek-coder-1.3b-base, and is fine-tuned on 2 billion tokens of instruction data.
Guide: Running Locally
- Setup: Install the necessary libraries and dependencies.
- Load Model: Use the
transformers
library to load the model and tokenizer.from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-1.3b-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-1.3b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
- Inference: Prepare input messages and generate outputs.
messages = [{'role': 'user', 'content': "write a quick sort algorithm in python."}] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id) print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
- Cloud GPUs: Consider using cloud services such as AWS, Google Cloud, or Azure for access to powerful GPUs.
License
The code repository is licensed under the MIT License. Usage of DeepSeek Coder models is subject to the Model License, which supports commercial use. For more details, refer to the LICENSE-MODEL.