Introduction

The BLOOMZ-560M is part of the BLOOMZ and mT0 family of models, designed for crosslingual generalization and capable of following human instructions in multiple languages without requiring task-specific training (zero-shot learning). These models are finetuned on multilingual language models (BLOOM and mT5) using the crosslingual task mixture xP3.

Architecture

The model architecture is based on the BLOOM-560M configuration. Finetuning involved 1,750 steps with 3.67 billion tokens, using a setup with 1x pipeline parallelism, 1x tensor parallelism, and 1x data parallelism at float16 precision. The hardware setup included 64 A100 80GB GPUs distributed over 8 nodes.

Training

Training utilized AMD CPUs with 512GB memory per node and a communication network using NCCL with a fully dedicated subnet. The software stack included Megatron-DeepSpeed for orchestration, DeepSpeed for optimization and parallelism, and PyTorch for neural network implementations.

Guide: Running Locally

To run BLOOMZ-560M locally, follow these steps:

  1. Install Required Libraries:

    pip install -q transformers accelerate
    
  2. Load Model and Tokenizer:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "bigscience/bloomz-560m"
    
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
    
  3. Generate Text:

    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    

For enhanced performance, consider using cloud GPUs such as NVIDIA A100s available on platforms like AWS, Google Cloud, or Azure.

License

The BLOOMZ-560M model is licensed under the BigScience BLOOM RAIL 1.0 license.

More Related APIs in Text Generation