bloom 560m
bigscienceIntroduction
BLOOM-560M is a multilingual language model developed by BigScience, designed to facilitate public research on large language models (LLMs). It is a transformer-based model that can be used for language generation and as a pretrained base model for various tasks. The model is released under the RAIL License v1.0 and supports 48 languages.
Architecture
The model architecture is based on a modified Megatron-LM GPT2 with a decoder-only structure. It features ALiBI positional encodings, GeLU activation functions, and includes 559,214,592 parameters across 24 layers with 16 attention heads. The hidden layers are 1024-dimensional, with a sequence length of 2048 tokens. The model utilizes cross-entropy as its objective function and is trained using the Jean Zay Public Supercomputer.
Training
The BLOOM-560M was trained on 1.5TB of pre-processed text data across 45 natural and 12 programming languages, resulting in 350 billion unique tokens. The training process involved a single epoch, leveraging 384 A100 80GB GPUs. The estimated cost of training is equivalent to $2-5M in cloud computing resources. The training began on March 11, 2022, and concluded on July 5, 2022. The model uses a byte-level Byte Pair Encoding (BPE) tokenizer with a vocabulary size of 250,680.
Guide: Running Locally
To run the BLOOM-560M model locally, follow these steps:
- Install Dependencies: Ensure you have Python and PyTorch installed, along with the
transformers
library from Hugging Face. - Download the Model: Use the Hugging Face
transformers
library to load the BLOOM-560M model. - Set Up Environment: Prepare a Python environment with necessary dependencies.
- Execution: Load the model and tokenizer, then use it for text generation tasks.
Given the model's size, using cloud GPUs like those offered by AWS, Google Cloud, or Azure is recommended for efficient execution.
License
BLOOM-560M is released under the BigScience RAIL License v1.0, which includes specific usage restrictions, especially concerning misuse and out-of-scope applications. The license details can be accessed here.