starcoder LLM Model — Open LLM List

Introduction

StarCoder is a 15.5 billion parameter model designed for text generation, particularly in the domain of programming. It has been trained on a diverse array of 80+ programming languages using data from The Stack. The model uses advanced techniques like Multi Query Attention and a Fill-in-the-Middle objective, making it highly competent in generating code snippets.

Architecture

StarCoder is based on the GPT-2 architecture with enhancements such as multi-query attention. It operates with a context window of 8192 tokens. The model was trained on a vast corpus of 1 trillion tokens, focusing on various programming languages, and employs a Fill-in-the-Middle objective for more sophisticated code generation capabilities.

Training

StarCoder underwent extensive pretraining involving 250,000 steps with a massive dataset of 1 trillion tokens. The training was conducted on 512 Tesla A100 GPUs over 24 days, equating to 320,256 GPU hours for pretraining, with an additional 11,208 GPU hours for fine-tuning. The model was trained using PyTorch and orchestrated with the Megatron-LM framework.

Guide: Running Locally

Install Dependencies: Ensure you have Python installed, then install the transformers library:
```
pip install -q transformers
```

Load the Model: Use the following Python code to load and run the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cuda" # Use "cpu" if a GPU is unavailable

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Cloud GPUs: For optimal performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure to run your model, particularly if you lack local GPU resources.

License

StarCoder is distributed under the BigCode OpenRAIL-M license. Users must comply with the specific restrictions and sharing requirements detailed in the license agreement, which is available here. For inquiries related to the license, contact contact@bigcode-project.org.

More Related APIs in Text Generation