codegen 350 M mono

Salesforce

Introduction

CodeGen is a series of autoregressive language models designed for program synthesis, as described in the paper "A Conversational Paradigm for Program Synthesis." The models are available in different pre-training data variants (NL, Multi, Mono) and model sizes (350M, 2B, 6B, 16B). CodeGen-Mono 350M, specifically, is pre-trained on Python programming data.

Architecture

CodeGen-Mono 350M is initialized with CodeGen-Multi 350M and further pre-trained using a dataset of Python programming language. It contains 350 million trainable parameters, designed to generate code from natural language prompts.

Training

This model was pre-trained on the BigPython dataset, consisting of 71.7 billion Python tokens. Training employed cross-entropy loss to maximize sequential input likelihood. Multiple TPU-v4-512 units by Google were used, employing data and model parallelism.

Guide: Running Locally

To use CodeGen-Mono 350M locally:

  1. Install Transformers: Ensure you have the Hugging Face Transformers library installed.
  2. Load the Model: Use the AutoTokenizer and AutoModelForCausalLM classes to load the model.
  3. Generate Code: Input a prompt in the form of a comment string and use the generate method to produce code.
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-mono")

text = "def hello_world():"
input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

Cloud GPUs

Consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure for efficient model inference, especially for larger models.

License

The CodeGen-Mono 350M model is licensed under the BSD 3-Clause License.

More Related APIs in Text Generation