deepseek coder 6.7b base
deepseek-aiDeepSeek Coder Documentation
Introduction
DeepSeek Coder consists of a series of code language models trained from scratch using 2 trillion tokens, with a composition of 87% code and 13% natural language in both English and Chinese. The models are available in various sizes, from 1B to 33B parameters, and are pre-trained on a project-level code corpus with a window size of 16K and a fill-in-the-blank task for code completion and infilling. DeepSeek Coder achieves state-of-the-art performance on multiple programming languages and benchmarks.
Architecture
The deepseek-coder-6.7b-base
model is a 6.7 billion parameter model utilizing Multi-Head Attention, trained on 2 trillion tokens. It is designed for advanced code completion and infilling tasks.
Training
- Massive Training Data: Trained on 2T tokens, predominantly code (87%) with linguistic data (13%) in English and Chinese.
- Model Variants: Available in sizes of 1.3B, 5.7B, 6.7B, and 33B, offering flexibility and scalability.
- Performance: Achieves top performance on benchmarks like HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.
Guide: Running Locally
-
Installation:
- Install
transformers
andtorch
using pip.
pip install transformers torch
- Install
-
Code Completion:
from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True).cuda() input_text = "#write a quick sort algorithm" inputs = tokenizer(input_text, return_tensors="pt").cuda() outputs = model.generate(**inputs, max_length=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-
Environment Setup:
- Use a local machine with a CUDA-enabled GPU or consider cloud options like AWS, Google Cloud, or Azure for better performance.
License
This code repository is licensed under the MIT License. Use of DeepSeek Coder models is governed by the Model License, permitting commercial use. For more details, refer to the LICENSE-MODEL.