Long L M large

thu-coai

Introduction

LongLM-large is a text generation model developed by the Conversational AI (COAI) group from Tsinghua University. It is designed for text-to-text generation tasks, utilizing the transformer architecture with a focus on the Chinese language. The model employs a variety of pretraining tasks to enhance its text generation capabilities.

Architecture

LongLM-large is based on the transformer architecture and is implemented in PyTorch. It features:

  • Hidden State Dimension ($d_m$): 1,536
  • Feed Forward Layer Dimension ($d_{ff}$): 3,072
  • Key/Value Dimension in Self-Attention ($d_{kv}$): 64
  • Number of Attention Heads ($n_h$): 12
  • Encoder Layers ($n_e$): 24
  • Decoder Layers ($n_d$): 32
  • Number of Parameters (#P): 1 billion

Training

The training process involves two pretraining tasks:

  1. Text Infilling: Spans of text are masked and replaced with special tokens. The model learns to predict these spans, which are sampled with a Poisson distribution (λ=3), masking 15% of the text.
  2. Conditional Continuation: The input is split into two parts, with the model learning to generate the latter part from the former.

Training data includes 120GB of novels.

Guide: Running Locally

To run LongLM-large locally, follow these steps:

  1. Install Dependencies: Ensure you have the following libraries:

    • transformers==4.6.1
    • torch==1.8.1
    • pytorch-lightning==1.2.0
    • Other dependencies like numpy, sentencepiece, etc.
  2. Load the Model:

    from transformers import T5Tokenizer, T5ForConditionalGeneration
    tokenizer = T5Tokenizer.from_pretrained('LongLM-large')
    tokenizer.add_special_tokens({"additional_special_tokens": ["<extra_id_%d>"%d for d in range(100)]})
    model = T5ForConditionalGeneration.from_pretrained('LongLM-large')
    
  3. Generate Text:

    input_ids = tokenizer("小咕噜对,<extra_id_1>", return_tensors="pt", padding=True, truncation=True, max_length=512).input_ids.to(device)
    gen = model.generate(input_ids, do_sample=True, decoder_start_token_id=1, top_p=0.9, max_length=512)
    
  4. Utilize Cloud GPUs: Consider using cloud services like AWS, Google Cloud, or Azure for access to powerful GPUs to enhance performance.

License

The model and its associated resources are provided under a license that allows for research and educational use. For detailed licensing information, refer to the official repository or contact the authors directly.

More Related APIs in Text2text Generation