bloomz 560m
bigscienceIntroduction
The BLOOMZ-560M is part of the BLOOMZ and mT0 family of models, designed for crosslingual generalization and capable of following human instructions in multiple languages without requiring task-specific training (zero-shot learning). These models are finetuned on multilingual language models (BLOOM and mT5) using the crosslingual task mixture xP3.
Architecture
The model architecture is based on the BLOOM-560M configuration. Finetuning involved 1,750 steps with 3.67 billion tokens, using a setup with 1x pipeline parallelism, 1x tensor parallelism, and 1x data parallelism at float16 precision. The hardware setup included 64 A100 80GB GPUs distributed over 8 nodes.
Training
Training utilized AMD CPUs with 512GB memory per node and a communication network using NCCL with a fully dedicated subnet. The software stack included Megatron-DeepSpeed for orchestration, DeepSpeed for optimization and parallelism, and PyTorch for neural network implementations.
Guide: Running Locally
To run BLOOMZ-560M locally, follow these steps:
-
Install Required Libraries:
pip install -q transformers accelerate
-
Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigscience/bloomz-560m" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
-
Generate Text:
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
For enhanced performance, consider using cloud GPUs such as NVIDIA A100s available on platforms like AWS, Google Cloud, or Azure.
License
The BLOOMZ-560M model is licensed under the BigScience BLOOM RAIL 1.0 license.