deepseek coder 6.7b base

deepseek-ai

DeepSeek Coder Documentation

Introduction

DeepSeek Coder consists of a series of code language models trained from scratch using 2 trillion tokens, with a composition of 87% code and 13% natural language in both English and Chinese. The models are available in various sizes, from 1B to 33B parameters, and are pre-trained on a project-level code corpus with a window size of 16K and a fill-in-the-blank task for code completion and infilling. DeepSeek Coder achieves state-of-the-art performance on multiple programming languages and benchmarks.

Architecture

The deepseek-coder-6.7b-base model is a 6.7 billion parameter model utilizing Multi-Head Attention, trained on 2 trillion tokens. It is designed for advanced code completion and infilling tasks.

Training

  • Massive Training Data: Trained on 2T tokens, predominantly code (87%) with linguistic data (13%) in English and Chinese.
  • Model Variants: Available in sizes of 1.3B, 5.7B, 6.7B, and 33B, offering flexibility and scalability.
  • Performance: Achieves top performance on benchmarks like HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.

Guide: Running Locally

  1. Installation:

    • Install transformers and torch using pip.
    pip install transformers torch
    
  2. Code Completion:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True).cuda()
    
    input_text = "#write a quick sort algorithm"
    inputs = tokenizer(input_text, return_tensors="pt").cuda()
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    
  3. Environment Setup:

    • Use a local machine with a CUDA-enabled GPU or consider cloud options like AWS, Google Cloud, or Azure for better performance.

License

This code repository is licensed under the MIT License. Use of DeepSeek Coder models is governed by the Model License, permitting commercial use. For more details, refer to the LICENSE-MODEL.

More Related APIs in Text Generation