codegen 16 B mono

Salesforce

Introduction

CodeGen is an autoregressive language model designed for program synthesis. It is part of a family of models created to generate executable code from English prompts. The specific model discussed here, CodeGen-Mono 16B, has been pre-trained using Python datasets and contains 16 billion trainable parameters. The research and model are detailed in the paper "A Conversational Paradigm for Program Synthesis."

Architecture

CodeGen-Mono 16B is initialized from CodeGen-Multi 16B and further pre-trained on a Python-specific dataset. The model architecture is designed to handle extensive token sequences, utilizing 16 billion parameters to perform code generation tasks effectively. The model leverages autoregressive capabilities to predict subsequent code tokens based on input sequences.

Training

The training of CodeGen-Mono 16B involved initializing it with the CodeGen-Multi 16B model and further pre-training on the BigPython dataset, which includes 71.7 billion tokens of Python code. The model uses cross-entropy loss to optimize the likelihood of sequences and was trained using Google's TPU-v4-512 systems. It employs both data and model parallelism to handle large datasets and complex computations.

Guide: Running Locally

To run CodeGen-Mono 16B locally, follow these steps:

  1. Install Libraries: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load the Model: Use the transformers library to load the model and tokenizer.

    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-mono")
    model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-mono")
    
  3. Generate Code: Input your code prompt and generate the continuation.

    text = "def hello_world():"
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    
    generated_ids = model.generate(input_ids, max_length=128)
    print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
    
  4. Consider Cloud GPUs: Due to the model's size, consider using cloud-based GPUs such as AWS EC2, Google Cloud, or Azure for efficient computation.

License

CodeGen-Mono 16B is released under the BSD-3-Clause license, allowing for redistribution and use with certain conditions.

More Related APIs in Text Generation