codegen 6 B multi

Salesforce

Introduction

CodeGen is a series of autoregressive language models designed for program synthesis. It is detailed in the paper "A Conversational Paradigm for Program Synthesis" and includes various models pre-trained on different data sets and model sizes. This documentation focuses on the CodeGen-Multi 6B model, which is capable of generating executable code from English prompts.

Architecture

CodeGen-Multi 6B is initialized with the CodeGen-NL 6B model and further pre-trained on a multi-language dataset. It contains 6 billion trainable parameters. The model uses transformers and is implemented in PyTorch, optimized for generating and completing code snippets.

Training

The model was initially trained with CodeGen-NL 6B and further pre-trained on a large-scale dataset from GitHub repositories, comprising 119.2 billion tokens in languages like C, C++, Go, Java, JavaScript, and Python. Training was conducted using cross-entropy loss with multiple TPU-v4-512 units, focusing on data and model parallelism.

Guide: Running Locally

To use the CodeGen-Multi 6B model locally, follow these basic steps:

  1. Install Required Libraries:

    pip install transformers torch
    
  2. Load the Model:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-6B-multi")
    model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-6B-multi")
    
    text = "def hello_world():"
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    
    generated_ids = model.generate(input_ids, max_length=128)
    print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
    
  3. Consider Using Cloud GPUs:
    For efficient use and handling of large models, it is recommended to use cloud-based GPU services such as AWS, GCP, or Azure.

License

The CodeGen-Multi 6B model is licensed under the BSD-3-Clause license, allowing for modification and distribution with minimal restrictions.

More Related APIs in Text Generation