codegeex4 all 9b

THUDM

Introduction

CodeGeeX4-ALL-9B is an open-source, multilingual code generation model developed to enhance code generation capabilities. It is based on the GLM-4-9B and is capable of performing code completion, generation, interpretation, web search, function calls, and repository-level code Q&A. The model excels in public benchmarks such as BigCodeBench and NaturalCodeBench, offering an optimal balance of inference speed and performance with fewer than 10 billion parameters.

Architecture

CodeGeeX4-ALL-9B is a transformer-based model trained for multilingual code generation tasks. It leverages the architecture of GLM-4-9B to deliver comprehensive code-related functionalities in both Chinese and English, outperforming larger general-purpose models.

Training

The model has been refined with advanced training techniques to support a variety of software development scenarios. This continuous training enhances its capability in code generation, leading to competitive performance metrics on several coding benchmarks.

Guide: Running Locally

To run CodeGeeX4-ALL-9B locally, follow these steps:

  1. Install Dependencies: Ensure you have transformers version 4.39.0 to 4.40.2 installed.
  2. Set Up Environment:
    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    device = "cuda" if torch.cuda.is_available() else "cpu"
    tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex4-all-9b", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        "THUDM/codegeex4-all-9b",
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        trust_remote_code=True
    ).to(device).eval()
    
  3. Generate Code:
    inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
    with torch.no_grad():
        outputs = model.generate(**inputs, max_length=256)
        outputs = outputs[:, inputs['input_ids'].shape[1]:]
        print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    
  4. Cloud GPUs: For optimal performance, consider using cloud GPU services like AWS, Google Cloud, or Azure to handle the computations involved in running the model.

License

The model weights are licensed under the CodeGeeX4 license. For detailed licensing information, please refer to this link.

More Related APIs in Text Generation