Introduction

Baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. It is based on the Transformer architecture, featuring 7 billion parameters and trained on approximately 1.2 trillion tokens. The model supports both Chinese and English languages, with a context window length of 4096 tokens. It excels in standard benchmarks for its size, achieving state-of-the-art performance on Chinese and English authoritative tests (C-EVAL/MMLU).

Architecture

Baichuan-7B follows the standard Transformer design similar to LLaMA, with specific enhancements:

  • Position Embedding: Utilizes rotary-embedding for excellent extrapolation.
  • Feedforward Layer: Employs SwiGLU, increasing the hidden layer size to 11,008.
  • Layer Normalization: Uses RMSNorm-based Pre-Normalization.

Key parameters include:

  • Number of parameters: 7,000,559,616
  • Number of layers: 32
  • Number of heads: 32
  • Model dimension (d_model): 4096
  • Vocabulary size: 64,000
  • Sequence length: 4096

Training

For detailed training settings, refer to the Baichuan-7B GitHub repository. The model has been optimized for Chinese using proprietary bilingual corpora and allows efficient fine-tuning for downstream tasks.

Guide: Running Locally

To perform inference using Baichuan-7B, follow these steps:

  1. Install the necessary libraries:
    pip install transformers torch
    
  2. Load the model and tokenizer:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
    
  3. Prepare input data and perform inference:
    inputs = tokenizer('Your input text here', return_tensors='pt').to('cuda:0')
    pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
    print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
    
  4. Consider using cloud GPUs for better performance, such as AWS, Google Cloud, or Azure.

License

Baichuan-7B is released under a lenient open-source license, allowing for commercial use. For more details, view the Baichuan-7B License.

More Related APIs in Text Generation