codegen 6 B multi
SalesforceIntroduction
CodeGen is a series of autoregressive language models designed for program synthesis. It is detailed in the paper "A Conversational Paradigm for Program Synthesis" and includes various models pre-trained on different data sets and model sizes. This documentation focuses on the CodeGen-Multi 6B model, which is capable of generating executable code from English prompts.
Architecture
CodeGen-Multi 6B is initialized with the CodeGen-NL 6B model and further pre-trained on a multi-language dataset. It contains 6 billion trainable parameters. The model uses transformers and is implemented in PyTorch, optimized for generating and completing code snippets.
Training
The model was initially trained with CodeGen-NL 6B and further pre-trained on a large-scale dataset from GitHub repositories, comprising 119.2 billion tokens in languages like C, C++, Go, Java, JavaScript, and Python. Training was conducted using cross-entropy loss with multiple TPU-v4-512 units, focusing on data and model parallelism.
Guide: Running Locally
To use the CodeGen-Multi 6B model locally, follow these basic steps:
-
Install Required Libraries:
pip install transformers torch
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-6B-multi") model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-6B-multi") text = "def hello_world():" input_ids = tokenizer(text, return_tensors="pt").input_ids generated_ids = model.generate(input_ids, max_length=128) print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
-
Consider Using Cloud GPUs:
For efficient use and handling of large models, it is recommended to use cloud-based GPU services such as AWS, GCP, or Azure.
License
The CodeGen-Multi 6B model is licensed under the BSD-3-Clause license, allowing for modification and distribution with minimal restrictions.