Long L M large LLM Model — Open LLM List

Introduction

LongLM-large is a text generation model developed by the Conversational AI (COAI) group from Tsinghua University. It is designed for text-to-text generation tasks, utilizing the transformer architecture with a focus on the Chinese language. The model employs a variety of pretraining tasks to enhance its text generation capabilities.

Architecture

LongLM-large is based on the transformer architecture and is implemented in PyTorch. It features:

Hidden State Dimension ($d_m$): 1,536
Feed Forward Layer Dimension ($d_{ff}$): 3,072
Key/Value Dimension in Self-Attention ($d_{kv}$): 64
Number of Attention Heads ($n_h$): 12
Encoder Layers ($n_e$): 24
Decoder Layers ($n_d$): 32
Number of Parameters (#P): 1 billion

Training

The training process involves two pretraining tasks:

Text Inﬁlling: Spans of text are masked and replaced with special tokens. The model learns to predict these spans, which are sampled with a Poisson distribution (λ=3), masking 15% of the text.
Conditional Continuation: The input is split into two parts, with the model learning to generate the latter part from the former.

Training data includes 120GB of novels.

Guide: Running Locally

To run LongLM-large locally, follow these steps:

Install Dependencies: Ensure you have the following libraries:
- transformers==4.6.1
- torch==1.8.1
- pytorch-lightning==1.2.0
- Other dependencies like numpy, sentencepiece, etc.

Load the Model:

from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained('LongLM-large')
tokenizer.add_special_tokens({"additional_special_tokens": ["<extra_id_%d>"%d for d in range(100)]})
model = T5ForConditionalGeneration.from_pretrained('LongLM-large')

Generate Text:

input_ids = tokenizer("小咕噜对，<extra_id_1>", return_tensors="pt", padding=True, truncation=True, max_length=512).input_ids.to(device)
gen = model.generate(input_ids, do_sample=True, decoder_start_token_id=1, top_p=0.9, max_length=512)

Utilize Cloud GPUs: Consider using cloud services like AWS, Google Cloud, or Azure for access to powerful GPUs to enhance performance.

License

The model and its associated resources are provided under a license that allows for research and educational use. For detailed licensing information, refer to the official repository or contact the authors directly.

More Related APIs in Text2text Generation