xlm roberta longformer base 4096
markussagenIntroduction
The XLM-R Longformer (XLM-Long) is an advanced model that extends the XLM-RoBERTa architecture to support sequence lengths of up to 4096 tokens. It aims to efficiently process long sequences and is particularly useful for low-resource languages like Swedish. The model was developed as part of a master's thesis project at Peltarion and is fine-tuned for multilingual question-answering tasks.
Architecture
XLM-Long is based on the XLM-RoBERTa model and incorporates the Longformer pre-training scheme to handle longer context efficiently. This combination allows it to manage extensive token sequences beyond the typical 512-token limit, making it suitable for tasks requiring longer context understanding.
Training
The model was pre-trained on the English WikiText-103 corpus using a 48GB GPU over 6000 iterations, taking approximately 5 days. It employs NVIDIA Apex for 16-bit precision and several gradient accumulation steps to optimize performance on large models. The training script includes various parameters such as learning rate, weight decay, and batch size adjustments to enhance the model's capabilities.
Guide: Running Locally
To use XLM-R Longformer locally for tasks like question-answering, follow these steps:
- Install Transformers Library: Use
pip install transformers
to get the necessary libraries. - Load Model and Tokenizer:
import torch from transformers import AutoModelForQuestionAnswering, AutoTokenizer MODEL_NAME_OR_PATH = "markussagen/xlm-roberta-longformer-base-4096" tokenizer = AutoTokenizer.from_pretrained( MODEL_NAME_OR_PATH, max_length=4096, padding="max_length", truncation=True, ) model = AutoModelForQuestionAnswering.from_pretrained( MODEL_NAME_OR_PATH )
- Use Cloud GPUs: For optimal performance, consider using cloud services offering large GPUs, such as AWS EC2 with GPU instances, Google Cloud, or Azure.
License
The model is released under the Apache 2.0 License, allowing broad usability and modification with proper attribution.