Code Llama 7b hf

codellama

Introduction

Code Llama is a suite of generative text models designed for code synthesis and understanding, available in various scales from 7 billion to 34 billion parameters. This documentation pertains to the 7B base version in the Hugging Face Transformers format. The models are suitable for tasks like code completion, infilling, and instruction following, with a specialization in Python.

Architecture

Code Llama is an auto-regressive language model using an optimized transformer architecture. It offers variants tailored for general code synthesis, Python-specific tasks, and instruction-following applications. The model sizes include 7B, 13B, and 34B parameters.

Training

The Code Llama models were trained using Meta's Research Super Cluster, leveraging a custom library for training and fine-tuning. Training required substantial computational resources, utilizing around 400K GPU hours on A100-80GB hardware. The training data is consistent with that used for Llama 2, with specific adjustments as detailed in the accompanying research paper.

Guide: Running Locally

To run Code Llama locally, follow these steps:

  1. Install Dependencies: Ensure you have transformers and accelerate installed. Use the following command:
    pip install transformers accelerate
    
  2. Load the Model: Use the Hugging Face library to load the model:
    from transformers import AutoTokenizer
    import transformers
    import torch
    
    model = "codellama/CodeLlama-7b-hf"
    
    tokenizer = AutoTokenizer.from_pretrained(model)
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map="auto",
    )
    
    sequences = pipeline(
        'import socket\n\ndef ping_exponential_backoff(host: str):',
        do_sample=True,
        top_k=10,
        temperature=0.1,
        top_p=0.95,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,
    )
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")
    
  3. Cloud GPUs: For optimal performance, consider using cloud-based GPU services like AWS EC2 with GPU instances, Google Cloud Platform, or Azure.

License

The Code Llama models are governed by a custom commercial license available through Meta's resources. For more details, visit the Meta license page. The intended use is for commercial and research purposes in English and related programming languages, while compliance with applicable laws and regulations is required.

More Related APIs in Text Generation