bloom 1b1
bigscienceIntroduction
BLOOM-1B1 is a large, multilingual language model developed by the BigScience project. It is designed for text generation and can be fine-tuned for various tasks using a transformer-based architecture. The model supports 48 languages and is licensed under the bigscience-bloom-rail-1.0.
Architecture
BLOOM-1B1 utilizes a modified Megatron-LM GPT-2 architecture featuring a decoder-only structure with layer normalization and ALiBI positional encodings. It includes 1,065,314,304 parameters distributed over 24 layers with 16 attention heads, using GeLU activation functions. The model's sequence length is capped at 2048 tokens.
Training
The model was trained on the Jean Zay Public Supercomputer in France, utilizing 384 A100 80GB GPUs. The training process involved 1 epoch from March 11, 2022, to July 5, 2022, at an estimated cost of $2-5 million. The training dataset comprises 1.5TB of pre-processed text, including 45 natural languages and 12 programming languages, resulting in 350 billion unique tokens.
Guide: Running Locally
To run BLOOM-1B1 locally, follow these steps:
- Set up environment: Ensure you have Python and PyTorch installed.
- Install dependencies: Use pip to install necessary libraries like Transformers, Hugging Face's tokenizers, and others.
- Download the model: Access the model from Hugging Face's model hub.
- Load the model: Use the Transformers library to load BLOOM-1B1.
- Run inference: Utilize the model for text generation tasks by providing a prompt.
For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
BLOOM-1B1 is licensed under the bigscience-bloom-rail-1.0, which includes specific usage restrictions to prevent misuse in high-stakes settings or malicious activities. For more details, refer to the full license text.