Chinese Llama 2 7b 4bit

LinkSoul

Introduction

The Chinese LLaMA-2-7B-4BIT is a fully open-source, commercially usable version of the Llama2 model. It accommodates both Chinese and English languages and follows the Llama-2-chat format, ensuring compatibility with optimizations for the original model.

Architecture

The model is a 7 billion parameter variant of Llama2, quantized to a 4-bit version for efficient deployment. It supports both Chinese and English languages and is tailored for text generation tasks.

Training

The model is trained using a dataset titled "instruction_merge_set," which includes 10 million entries in both English and Chinese. The training and inference code is available on GitHub, facilitating replication and further development.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have PyTorch and Transformers installed.
  2. Load the Model: Use the AutoTokenizer and AutoModelForCausalLM from the Transformers library.
    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
    
    model_path = "LinkSoul/Chinese-Llama-2-7b-4bit"
    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        load_in_4bit=True,
        torch_dtype=torch.float16,
        device_map='auto'
    )
    
  3. Generate Text: Use the generate method to create text based on a prompt.
    instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant...<</SYS>>\n\n{} [/INST]"""
    prompt = instruction.format("用英文回答,什么是夫妻肺片?")
    generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096)
    

For optimal performance, using a cloud GPU is recommended, such as those available from AWS, Google Cloud, or Azure.

License

The project is distributed under the Apache-2.0 license, allowing for both personal and commercial use.

More Related APIs in Text Generation