codegen 6 B multi LLM Model

Introduction

CodeGen is a series of autoregressive language models designed for program synthesis. It is detailed in the paper "A Conversational Paradigm for Program Synthesis" and includes various models pre-trained on different data sets and model sizes. This documentation focuses on the CodeGen-Multi 6B model, which is capable of generating executable code from English prompts.

Architecture

CodeGen-Multi 6B is initialized with the CodeGen-NL 6B model and further pre-trained on a multi-language dataset. It contains 6 billion trainable parameters. The model uses transformers and is implemented in PyTorch, optimized for generating and completing code snippets.

Training

The model was initially trained with CodeGen-NL 6B and further pre-trained on a large-scale dataset from GitHub repositories, comprising 119.2 billion tokens in languages like C, C++, Go, Java, JavaScript, and Python. Training was conducted using cross-entropy loss with multiple TPU-v4-512 units, focusing on data and model parallelism.

Guide: Running Locally

To use the CodeGen-Multi 6B model locally, follow these basic steps:

Install Required Libraries:
```
pip install transformers torch
```

Load the Model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-6B-multi")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-6B-multi")

text = "def hello_world():"
input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

Consider Using Cloud GPUs:
For efficient use and handling of large models, it is recommended to use cloud-based GPU services such as AWS, GCP, or Azure.

License

The CodeGen-Multi 6B model is licensed under the BSD-3-Clause license, allowing for modification and distribution with minimal restrictions.

More Related APIs in Text Generation