Megrez 3 B Instruct
InfinigenceIntroduction
Megrez-3B-Instruct, developed by Infinigence AI, is a language model designed for high-speed inference and ease of use. It is optimized for text generation tasks and supports both Chinese and English languages. The model is characterized by its high accuracy, speed, and simple deployment, making it suitable for various applications including WebSearch.
Architecture
- Base Architecture: Llama-2 with GQA
- Context Length: 32K tokens
- Total Parameters: 2.92B
- Backbone Parameters: 2.29B (excluding Embeddings and Softmax)
- Vocabulary Size: 122,880
- Training Data: 3T tokens
- Supported Languages: Chinese and English
Training
Megrez-3B-Instruct utilizes a blend of software and hardware optimizations to achieve high inference speeds, outperforming models of similar precision by up to 300%. It maintains a balance between using innovative model structures and adopting traditional ones to minimize development complexity.
Guide: Running Locally
To run the Megrez-3B-Instruct model locally, follow these steps:
-
Install Transformers:
pip install transformers
-
Run the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch path = "Infinigence/Megrez-3B-Instruct" device = "cuda" tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True) messages = [{"role": "user", "content": "讲讲黄焖鸡的做法。"}] model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device) model_outputs = model.generate( model_inputs, do_sample=True, max_new_tokens=1024, top_p=0.9, temperature=0.2 ) output_token_ids = [ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs)) ] responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0] print(responses)
-
Cloud GPUs: For optimal performance, consider using cloud-based GPUs, such as those from AWS, Google Cloud, or Azure.
License
The code and model are open-sourced under the Apache-2.0 license. Users should be cautious of the model's potential for generating hallucinations and are advised to use the WebSearch feature for more factual outputs. For mathematical or logical tasks, adjustments in temperature parameters are recommended to ensure consistency in results. The model is released with considerations for data compliance; however, Infinigence AI disclaims responsibility for any issues arising from its use.