bloom 7b1
bigscienceIntroduction
BLOOM-7B1 is a large-scale, open-access, multilingual language model developed by the BigScience Workshop. It is designed to advance public research in large language models (LLMs) and is intended for tasks such as text generation and as a pretrained base for further fine-tuning. It supports 48 languages and is based on a Transformer architecture.
Architecture
BLOOM-7B1 is a modified version of the Megatron-LM GPT-2 model. It features a decoder-only architecture with layer normalization applied to the word embeddings layer, ALiBI positional encodings, and GeLU activation functions. The model has 7,069,016,064 parameters, including 1,027,604,480 embedding parameters over 30 layers and 32 attention heads, with hidden layers being 4096-dimensional. The sequence length used is 2048 tokens. The model was trained on the Jean Zay Public Supercomputer using 384 A100 80GB GPUs.
Training
BLOOM-7B1 was trained on a dataset containing 45 natural languages and 12 programming languages, totaling 1.5TB of pre-processed text converted into 350 billion unique tokens. The training used a byte-level Byte Pair Encoding (BPE) tokenizer with no normalization and a vocabulary size of 250,680. Training started on March 11, 2022, and ended on July 5, 2022, with an estimated cost equivalent to $2-5 million in cloud computing. The model was trained for one epoch, with a training loss of 2.3 and a validation loss of 2.9.
Guide: Running Locally
To use BLOOM-7B1 locally, follow these steps:
- Install Python and create a virtual environment.
- Install the Hugging Face Transformers library using
pip install transformers
. - Load the model using the
from_pretrained
method from the Transformers library.
For optimal performance, especially for inference, it is recommended to use cloud GPUs. Providers like AWS, Google Cloud, or Azure offer instances with NVIDIA GPUs suitable for running large models.
License
BLOOM-7B1 is released under the BigScience RAIL License v1.0. This license outlines permissible uses and restrictions, emphasizing non-commercial use and forbidding high-stakes or harmful applications. Users must adhere to these terms and provide attribution when utilizing the model.