bloomz
bigscienceIntroduction
BLOOMZ is part of a family of models developed to perform tasks in multiple languages without the need for task-specific finetuning. These models, based on the BLOOM and mT5 architectures, have been finetuned on a crosslingual task mixture (xP3) to enhance their ability to generalize across different languages and tasks.
Architecture
The BLOOMZ models share the architecture of the original BLOOM models. They have been finetuned using a multitask approach on the xP3 dataset. The model architecture supports a wide range of parameters, from 300 million to 176 billion, allowing for different scalability and performance needs.
Training
The models underwent a rigorous training process that involved finetuning on 2.09 billion tokens with specific hardware and software configurations. Training utilized AMD CPUs with 512GB memory per node and a setup of 288 A100 80GB GPUs. The process was orchestrated using Megatron-DeepSpeed and DeepSpeed for parallelism and optimization, with PyTorch serving as the neural network library.
Guide: Running Locally
Steps
-
Install Required Libraries:
pip install transformers
-
Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigscience/bloomz" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint)
-
Execute a Task (e.g., Translation):
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
Cloud GPUs
For enhanced performance, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure, which offer access to powerful GPUs suitable for running large models.
License
The BLOOMZ model is licensed under the BigScience Open RAIL License 1.0, which allows for open, responsible AI use with certain restrictions.