bloomz 1b1
bigscienceIntroduction
BLOOMZ-1B1 is a multilingual language model developed by the BigScience Workshop. It is part of the BLOOMZ and mT0 model family and is capable of following human instructions in various languages without requiring task-specific training (zero-shot). The model is finetuned on the xP3 dataset to enhance its cross-lingual generalization capabilities.
Architecture
The architecture of BLOOMZ-1B1 is based on the same design as the bloom-1b1 model. It supports multiple languages and programming languages, making it versatile for a wide range of tasks. The model is optimized for text-generation tasks and can handle inputs in languages such as English, Spanish, French, and many others.
Training
BLOOMZ-1B1 was finetuned using 502 million tokens over 250 steps. The training was conducted using a combination of pipeline, tensor, and data parallelism. This process was facilitated by 64 A100 80GB GPUs, distributed across multiple nodes, and orchestrated with Megatron-DeepSpeed and DeepSpeed libraries. The model supports FP16 precision and relies on PyTorch for neural network operations.
Guide: Running Locally
To run BLOOMZ-1B1 locally, follow these steps:
-
Install Dependencies:
pip install transformers accelerate
-
Load the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigscience/bloomz-1b1" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
-
Prepare Input and Generate Output:
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
For optimal performance, consider using cloud GPUs such as those available on AWS, Google Cloud, or Azure.
License
BLOOMZ-1B1 is released under the BigScience BLOOM RAIL 1.0 license, which governs its use and distribution.