Long L M large
thu-coaiIntroduction
LongLM-large is a text generation model developed by the Conversational AI (COAI) group from Tsinghua University. It is designed for text-to-text generation tasks, utilizing the transformer architecture with a focus on the Chinese language. The model employs a variety of pretraining tasks to enhance its text generation capabilities.
Architecture
LongLM-large is based on the transformer architecture and is implemented in PyTorch. It features:
- Hidden State Dimension ($d_m$): 1,536
- Feed Forward Layer Dimension ($d_{ff}$): 3,072
- Key/Value Dimension in Self-Attention ($d_{kv}$): 64
- Number of Attention Heads ($n_h$): 12
- Encoder Layers ($n_e$): 24
- Decoder Layers ($n_d$): 32
- Number of Parameters (#P): 1 billion
Training
The training process involves two pretraining tasks:
- Text Infilling: Spans of text are masked and replaced with special tokens. The model learns to predict these spans, which are sampled with a Poisson distribution (λ=3), masking 15% of the text.
- Conditional Continuation: The input is split into two parts, with the model learning to generate the latter part from the former.
Training data includes 120GB of novels.
Guide: Running Locally
To run LongLM-large locally, follow these steps:
-
Install Dependencies: Ensure you have the following libraries:
transformers==4.6.1
torch==1.8.1
pytorch-lightning==1.2.0
- Other dependencies like
numpy
,sentencepiece
, etc.
-
Load the Model:
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained('LongLM-large') tokenizer.add_special_tokens({"additional_special_tokens": ["<extra_id_%d>"%d for d in range(100)]}) model = T5ForConditionalGeneration.from_pretrained('LongLM-large')
-
Generate Text:
input_ids = tokenizer("小咕噜对,<extra_id_1>", return_tensors="pt", padding=True, truncation=True, max_length=512).input_ids.to(device) gen = model.generate(input_ids, do_sample=True, decoder_start_token_id=1, top_p=0.9, max_length=512)
-
Utilize Cloud GPUs: Consider using cloud services like AWS, Google Cloud, or Azure for access to powerful GPUs to enhance performance.
License
The model and its associated resources are provided under a license that allows for research and educational use. For detailed licensing information, refer to the official repository or contact the authors directly.