codegen 16 B mono
SalesforceIntroduction
CodeGen is an autoregressive language model designed for program synthesis. It is part of a family of models created to generate executable code from English prompts. The specific model discussed here, CodeGen-Mono 16B, has been pre-trained using Python datasets and contains 16 billion trainable parameters. The research and model are detailed in the paper "A Conversational Paradigm for Program Synthesis."
Architecture
CodeGen-Mono 16B is initialized from CodeGen-Multi 16B and further pre-trained on a Python-specific dataset. The model architecture is designed to handle extensive token sequences, utilizing 16 billion parameters to perform code generation tasks effectively. The model leverages autoregressive capabilities to predict subsequent code tokens based on input sequences.
Training
The training of CodeGen-Mono 16B involved initializing it with the CodeGen-Multi 16B model and further pre-training on the BigPython dataset, which includes 71.7 billion tokens of Python code. The model uses cross-entropy loss to optimize the likelihood of sequences and was trained using Google's TPU-v4-512 systems. It employs both data and model parallelism to handle large datasets and complex computations.
Guide: Running Locally
To run CodeGen-Mono 16B locally, follow these steps:
-
Install Libraries: Ensure you have the
transformers
library installed.pip install transformers
-
Load the Model: Use the
transformers
library to load the model and tokenizer.from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-mono") model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-mono")
-
Generate Code: Input your code prompt and generate the continuation.
text = "def hello_world():" input_ids = tokenizer(text, return_tensors="pt").input_ids generated_ids = model.generate(input_ids, max_length=128) print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
-
Consider Cloud GPUs: Due to the model's size, consider using cloud-based GPUs such as AWS EC2, Google Cloud, or Azure for efficient computation.
License
CodeGen-Mono 16B is released under the BSD-3-Clause license, allowing for redistribution and use with certain conditions.