deepseek coder 33b instruct
deepseek-aiIntroduction
DeepSeek Coder consists of advanced code language models trained on a large dataset of 2 trillion tokens, predominantly composed of code (87%) and natural language (13%) in English and Chinese. The models, available in sizes ranging from 1B to 33B parameters, are designed for project-level code completion and infilling tasks using a 16K window size and a fill-in-the-blank task. These models achieve state-of-the-art performance on benchmarks like HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.
Architecture
DeepSeek Coder models come in multiple sizes, including 1.3B, 5.7B, 6.7B, and 33B parameters, allowing flexibility and scalability based on user needs. The 33B parameter model, deepseek-coder-33b-instruct, is fine-tuned on 2 billion tokens of instruction data, offering superior performance on various programming challenges.
Training
The models are trained from scratch on a massive corpus of 2 trillion tokens, including both code and natural language data in English and Chinese. This extensive training helps achieve high performance across multiple programming languages and evaluation benchmarks.
Guide: Running Locally
To run the DeepSeek Coder model locally:
- Setup Environment: Ensure you have Python and PyTorch installed. Use a virtual environment for better dependency management.
- Install Transformers: Run
pip install transformers
to get the necessary libraries. - Load Model and Tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
- Perform Inference: Use the loaded model and tokenizer for code generation tasks.
- Suggested Cloud GPUs: For optimal performance, particularly with larger models, consider using cloud services like AWS, GCP, or Azure that offer NVIDIA GPUs.
License
The code repository is licensed under the MIT License, while the DeepSeek Coder models are subject to a separate Model License, which permits commercial use. More details can be found in the LICENSE-MODEL.