tr11 176 B logs

bigscience

Introduction

The BigScience project is an open and collaborative workshop focused on the study and creation of very large language models, engaging over 1000 researchers worldwide. This initiative involves training a multilingual language model with 176 billion parameters, utilizing advanced computational resources and innovative engineering techniques.

Architecture

  • The model follows a 176 billion parameters decoder-only architecture similar to GPT.
  • It comprises 70 layers with 112 attention heads per layer, a hidden dimensionality of 14,336, and a sequence length of 2048 tokens.
  • It employs ALiBi positional embeddings and the GeLU activation function.
  • Further architectural details and optimization strategies can be found in the project’s GitHub repository.

Training

  • The dataset encompasses 46 languages, accumulating 341.6 billion tokens or 1.5 TB of text data.
  • A tokenizer vocabulary of 250,680 tokens is utilized.
  • Training is conducted on 384 A100 GPUs with 80GB of memory each, taking approximately 3-4 months.
  • One model copy requires 48 GPUs, with a checkpoint size of 329GB for bf16 weights and 2.3TB with optimizer states.
  • The training throughput is estimated at ~150 TFLOPs.
  • Environmental considerations are prioritized, with efforts to quantify the carbon footprint and optimize efficiency.

Guide: Running Locally

To run similar models locally:

  1. Setup Environment: Install dependencies and set up a Python virtual environment.
  2. Download Code: Clone the repository from GitHub.
  3. Prepare Data: Acquire a suitable dataset and preprocess it using the provided tokenizer.
  4. Train Model: Utilize the specified training scripts, configuring the hyperparameters as needed.
  5. Evaluate Model: Run evaluation scripts to assess model performance.

For substantial computational needs, consider using cloud GPUs like AWS EC2, Google Cloud GPU instances, or Azure's N-Series VMs.

License

The project is open-source and the code is available under a license that encourages collaboration and sharing within the community. Specific license details can be found in the repository.

More Related APIs