ul2 LLM Model — Open LLM List

Introduction

UL2 is a unified framework for pretraining models effective across various datasets and setups. It employs a Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse paradigms. UL2 introduces mode switching, associating downstream fine-tuning with specific pre-training schemes. This model achieves state-of-the-art performance on multiple NLP tasks, outperforming models like T5 and GPT-3.

Architecture

UL2 is designed with 32 encoder layers and 32 decoder layers, with a model dimension (d_model) of 4096 and a feed-forward dimension (d_ff) of 16384. Each attention head has a dimension of 256, totaling 16 heads. The architecture supports model parallelism of 8 and uses the same SentencePiece tokenizer as T5 with a vocabulary size of 32,000.

Training

The pretraining process involves training on the C4 corpus with 1 trillion tokens over 2 million steps. It uses a batch size of 1024 and a sequence length of 512/512 for inputs and targets. Pretraining spans over a month, with dropout set to 0. Fine-tuning occurs after every 50k to 100k pretraining steps and continues until state-of-the-art performance is achieved, totaling 2.65 million steps.

Mixture of Denoisers

R-Denoiser: Standard span corruption masking about 15% of input tokens.
S-Denoiser: Prefix language modeling with strict sequential order, dividing the input into context and target.
X-Denoiser: Extreme denoising with approximately 50% input sequence masking, simulating long target generation from limited information.

Guide: Running Locally

To run the UL2 model locally, you'll need at least a 40GB A100 GPU due to its size. Below are the basic steps:

Installation: Ensure you have the necessary libraries, e.g., PyTorch and Hugging Face Transformers.

Model Loading: Load the model using the following Python code:

from transformers import T5ForConditionalGeneration, AutoTokenizer
import torch

model = T5ForConditionalGeneration.from_pretrained("google/ul2", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("google/ul2")

Example Usage: Prepare input strings with the appropriate prefix ([S2S], [NLU], [NLG]) and generate outputs using the model.
Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure, which provide access to powerful GPUs suitable for such operations.

License

The UL2 model is licensed under the Apache 2.0 License, permitting use, modification, and distribution, provided that the original terms and conditions are met.

More Related APIs in Text2text Generation