bloom
bigscienceIntroduction
BLOOM is a multilingual language model developed by the BigScience collaboration. It is designed to generate coherent text across 46 natural languages and 13 programming languages. As a transformer-based large language model, BLOOM can perform text tasks by continuing prompts and demonstrates versatility in various applications such as text generation, information extraction, and summarization.
Architecture
BLOOM utilizes a decoder-only architecture adapted from Megatron-LM GPT2. It features ALiBI positional encodings, GeLU activation functions, and layer normalization applied to word embeddings. The model has 176 billion parameters, including 3.6 billion embedding parameters, spread across 70 layers with 112 attention heads. It supports a sequence length of 2048 tokens and uses a cross-entropy objective function for training.
Training
BLOOM was trained on 1.6 TB of pre-processed text data, comprising 350 billion unique tokens from 46 languages and 13 programming languages. The training took place on the Jean Zay Public Supercomputer in France, utilizing 384 A100 GPUs. The training aimed to minimize cross-entropy loss and achieved a final perplexity of 7.045. The environmental impact of the training was considered, with the supercomputer using primarily nuclear energy.
Guide: Running Locally
- Installation: Ensure you have Python installed along with the
transformers
andaccelerate
libraries. - Download the Model: Use Hugging Face's model hub to download BLOOM.
- Inference: Load the model using the
transformers
library and run inference on your text data. - Hardware Recommendation: For optimal performance, consider using cloud GPU services like AWS or Google Cloud, which offer powerful GPU instances.
License
BLOOM is released under the BigScience BLOOM RAIL License v1.0, which includes specific usage restrictions. Users are encouraged to review the license details to understand the permissible and prohibited use cases.