bloom 1b7
bigscienceIntroduction
BLOOM-1B7 is a multilingual, transformer-based language model developed by BigScience. It is designed for research on large language models (LLMs) and supports text generation and other NLP tasks. The model is released under the RAIL License v1.0 and supports 48 languages.
Architecture
BLOOM-1B7 employs a modified Megatron-LM GPT-2 architecture. It features a decoder-only design with 24 layers and 16 attention heads, totaling 1.72 billion parameters. The model uses ALiBI positional encodings and GeLU activation functions. Training was conducted on the Jean Zay supercomputer using 64 V100 GPUs.
Training
The model was trained on a diverse dataset comprising 45 natural languages and 12 programming languages, resulting in 1.5TB of processed text. Training spanned from March 11 to May 20, 2022, with a focus on minimizing cross-entropy loss and perplexity. A byte-level BPE tokenizer was used, with a vocabulary of 250,680 tokens.
Guide: Running Locally
To run BLOOM-1B7 locally:
- Ensure your environment meets the model's requirements, including Python and PyTorch installations.
- Download the model weights from Hugging Face's model hub.
- Load the model using the
transformers
library for text generation tasks. - For efficiency, consider using cloud GPUs like AWS or GCP to handle the computational load.
License
BLOOM-1B7 is licensed under the bigscience-bloom-rail-1.0, which specifies terms for usage and distribution, including restrictions on high-stakes or harmful applications.