Baichuan 7 B
baichuan-incIntroduction
Baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. It is based on the Transformer architecture, featuring 7 billion parameters and trained on approximately 1.2 trillion tokens. The model supports both Chinese and English languages, with a context window length of 4096 tokens. It excels in standard benchmarks for its size, achieving state-of-the-art performance on Chinese and English authoritative tests (C-EVAL/MMLU).
Architecture
Baichuan-7B follows the standard Transformer design similar to LLaMA, with specific enhancements:
- Position Embedding: Utilizes rotary-embedding for excellent extrapolation.
- Feedforward Layer: Employs SwiGLU, increasing the hidden layer size to 11,008.
- Layer Normalization: Uses RMSNorm-based Pre-Normalization.
Key parameters include:
- Number of parameters: 7,000,559,616
- Number of layers: 32
- Number of heads: 32
- Model dimension (d_model): 4096
- Vocabulary size: 64,000
- Sequence length: 4096
Training
For detailed training settings, refer to the Baichuan-7B GitHub repository. The model has been optimized for Chinese using proprietary bilingual corpora and allows efficient fine-tuning for downstream tasks.
Guide: Running Locally
To perform inference using Baichuan-7B, follow these steps:
- Install the necessary libraries:
pip install transformers torch
- Load the model and tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
- Prepare input data and perform inference:
inputs = tokenizer('Your input text here', return_tensors='pt').to('cuda:0') pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1) print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
- Consider using cloud GPUs for better performance, such as AWS, Google Cloud, or Azure.
License
Baichuan-7B is released under a lenient open-source license, allowing for commercial use. For more details, view the Baichuan-7B License.