bloomz 560m LLM Model — Open LLM List

Introduction

The BLOOMZ-560M is part of the BLOOMZ and mT0 family of models, designed for crosslingual generalization and capable of following human instructions in multiple languages without requiring task-specific training (zero-shot learning). These models are finetuned on multilingual language models (BLOOM and mT5) using the crosslingual task mixture xP3.

Architecture

The model architecture is based on the BLOOM-560M configuration. Finetuning involved 1,750 steps with 3.67 billion tokens, using a setup with 1x pipeline parallelism, 1x tensor parallelism, and 1x data parallelism at float16 precision. The hardware setup included 64 A100 80GB GPUs distributed over 8 nodes.

Training

Training utilized AMD CPUs with 512GB memory per node and a communication network using NCCL with a fully dedicated subnet. The software stack included Megatron-DeepSpeed for orchestration, DeepSpeed for optimization and parallelism, and PyTorch for neural network implementations.

Guide: Running Locally

To run BLOOMZ-560M locally, follow these steps:

Install Required Libraries:
```
pip install -q transformers accelerate
```

Load Model and Tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloomz-560m"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")

Generate Text:

inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

For enhanced performance, consider using cloud GPUs such as NVIDIA A100s available on platforms like AWS, Google Cloud, or Azure.

License

The BLOOMZ-560M model is licensed under the BigScience BLOOM RAIL 1.0 license.

More Related APIs in Text Generation