Chinese Llama 2 7b 4bit
LinkSoulIntroduction
The Chinese LLaMA-2-7B-4BIT is a fully open-source, commercially usable version of the Llama2 model. It accommodates both Chinese and English languages and follows the Llama-2-chat format, ensuring compatibility with optimizations for the original model.
Architecture
The model is a 7 billion parameter variant of Llama2, quantized to a 4-bit version for efficient deployment. It supports both Chinese and English languages and is tailored for text generation tasks.
Training
The model is trained using a dataset titled "instruction_merge_set," which includes 10 million entries in both English and Chinese. The training and inference code is available on GitHub, facilitating replication and further development.
Guide: Running Locally
To run the model locally, follow these steps:
- Install Dependencies: Ensure you have PyTorch and Transformers installed.
- Load the Model: Use the
AutoTokenizer
andAutoModelForCausalLM
from the Transformers library.import torch from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer model_path = "LinkSoul/Chinese-Llama-2-7b-4bit" tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) model = AutoModelForCausalLM.from_pretrained( model_path, load_in_4bit=True, torch_dtype=torch.float16, device_map='auto' )
- Generate Text: Use the
generate
method to create text based on a prompt.instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant...<</SYS>>\n\n{} [/INST]""" prompt = instruction.format("用英文回答,什么是夫妻肺片?") generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096)
For optimal performance, using a cloud GPU is recommended, such as those available from AWS, Google Cloud, or Azure.
License
The project is distributed under the Apache-2.0 license, allowing for both personal and commercial use.