Llama3.1 8 B Chinese Chat
shenzhi-wangLlama3.1-8B-Chinese-Chat
Introduction
Llama3.1-8B-Chinese-Chat is an instruction-tuned language model designed for both Chinese and English users. It is developed on the Meta-Llama-3.1-8B-Instruct model and employs the ORPO fine-tuning algorithm. This model is tailored for roles like roleplaying and tool usage.
Architecture
The model is based on the Meta-Llama-3.1-8B-Instruct architecture with a model size of 8.03 billion parameters and a context length of 128,000 tokens. The model supports dual languages, English and Chinese, and has been fine-tuned to improve on various capabilities such as roleplay and mathematical proficiency.
Training
The training framework used for this model is LLaMA-Factory. Key training details include:
- Epochs: 3
- Learning Rate: 3e-6
- Scheduler: Cosine
- Warmup Ratio: 0.1
- Context Length: 8192
- ORPO Beta: 0.05
- Batch Size: 128
- Fine-tuning: Full Parameters
- Optimizer: Paged AdamW 32-bit
Guide: Running Locally
-
Environment Setup:
Ensure you have the latest version of thetransformers
package (version 4.43.0 or later). -
Download the Model:
Use the Python script below to download the BF16 version of the model:from huggingface_hub import snapshot_download snapshot_download(repo_id="shenzhi-wang/Llama3.1-8B-Chinese-Chat", ignore_patterns=["*.gguf"])
-
Model Inference:
import torch import transformers from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "/Your/Local/Path/to/Llama3.1-8B-Chinese-Chat" dtype = torch.bfloat16 tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="cuda", torch_dtype=dtype, ) chat = [{"role": "user", "content": "写一首关于机器学习的诗。"}] input_ids = tokenizer.apply_chat_template( chat, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate( input_ids, max_new_tokens=8192, do_sample=True, temperature=0.6, top_p=0.9, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True))
-
Using GGUF Models:
- Download GGUF models from the specified folder.
- Use them with LM Studio or follow instructions from
llama.cpp
.
-
Cloud GPU Recommendation:
For optimal performance, consider using cloud-based GPUs such as AWS EC2 instances with NVIDIA GPUs or Google Cloud's AI Platform.
License
This model is licensed under the Llama-3.1 License. For more information, please refer to the license document.