Chinese Llama 2 7b 4bit LLM Model

Introduction

The Chinese LLaMA-2-7B-4BIT is a fully open-source, commercially usable version of the Llama2 model. It accommodates both Chinese and English languages and follows the Llama-2-chat format, ensuring compatibility with optimizations for the original model.

Architecture

The model is a 7 billion parameter variant of Llama2, quantized to a 4-bit version for efficient deployment. It supports both Chinese and English languages and is tailored for text generation tasks.

Training

The model is trained using a dataset titled "instruction_merge_set," which includes 10 million entries in both English and Chinese. The training and inference code is available on GitHub, facilitating replication and further development.

Guide: Running Locally

To run the model locally, follow these steps:

Install Dependencies: Ensure you have PyTorch and Transformers installed.

Load the Model: Use the AutoTokenizer and AutoModelForCausalLM from the Transformers library.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_4bit=True,
    torch_dtype=torch.float16,
    device_map='auto'
)

Generate Text: Use the generate method to create text based on a prompt.

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant...<</SYS>>\n\n{} [/INST]"""
prompt = instruction.format("用英文回答，什么是夫妻肺片？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096)

For optimal performance, using a cloud GPU is recommended, such as those available from AWS, Google Cloud, or Azure.

License

The project is distributed under the Apache-2.0 license, allowing for both personal and commercial use.

More Related APIs in Text Generation