Introduction

StarCoder is a 15.5 billion parameter model designed for text generation, particularly in the domain of programming. It has been trained on a diverse array of 80+ programming languages using data from The Stack. The model uses advanced techniques like Multi Query Attention and a Fill-in-the-Middle objective, making it highly competent in generating code snippets.

Architecture

StarCoder is based on the GPT-2 architecture with enhancements such as multi-query attention. It operates with a context window of 8192 tokens. The model was trained on a vast corpus of 1 trillion tokens, focusing on various programming languages, and employs a Fill-in-the-Middle objective for more sophisticated code generation capabilities.

Training

StarCoder underwent extensive pretraining involving 250,000 steps with a massive dataset of 1 trillion tokens. The training was conducted on 512 Tesla A100 GPUs over 24 days, equating to 320,256 GPU hours for pretraining, with an additional 11,208 GPU hours for fine-tuning. The model was trained using PyTorch and orchestrated with the Megatron-LM framework.

Guide: Running Locally

  1. Install Dependencies: Ensure you have Python installed, then install the transformers library:
    pip install -q transformers
    
  2. Load the Model: Use the following Python code to load and run the model:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "bigcode/starcoder"
    device = "cuda" # Use "cpu" if a GPU is unavailable
    
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
    
    inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    
  3. Cloud GPUs: For optimal performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure to run your model, particularly if you lack local GPU resources.

License

StarCoder is distributed under the BigCode OpenRAIL-M license. Users must comply with the specific restrictions and sharing requirements detailed in the license agreement, which is available here. For inquiries related to the license, contact contact@bigcode-project.org.

More Related APIs in Text Generation