Introduction

BLOOMZ-1B1 is a multilingual language model developed by the BigScience Workshop. It is part of the BLOOMZ and mT0 model family and is capable of following human instructions in various languages without requiring task-specific training (zero-shot). The model is finetuned on the xP3 dataset to enhance its cross-lingual generalization capabilities.

Architecture

The architecture of BLOOMZ-1B1 is based on the same design as the bloom-1b1 model. It supports multiple languages and programming languages, making it versatile for a wide range of tasks. The model is optimized for text-generation tasks and can handle inputs in languages such as English, Spanish, French, and many others.

Training

BLOOMZ-1B1 was finetuned using 502 million tokens over 250 steps. The training was conducted using a combination of pipeline, tensor, and data parallelism. This process was facilitated by 64 A100 80GB GPUs, distributed across multiple nodes, and orchestrated with Megatron-DeepSpeed and DeepSpeed libraries. The model supports FP16 precision and relies on PyTorch for neural network operations.

Guide: Running Locally

To run BLOOMZ-1B1 locally, follow these steps:

  1. Install Dependencies:

    pip install transformers accelerate
    
  2. Load the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "bigscience/bloomz-1b1"
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
    
  3. Prepare Input and Generate Output:

    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    

For optimal performance, consider using cloud GPUs such as those available on AWS, Google Cloud, or Azure.

License

BLOOMZ-1B1 is released under the BigScience BLOOM RAIL 1.0 license, which governs its use and distribution.

More Related APIs in Text Generation