xlm roberta longformer base 4096 LLM Model

Introduction

The XLM-R Longformer (XLM-Long) is an advanced model that extends the XLM-RoBERTa architecture to support sequence lengths of up to 4096 tokens. It aims to efficiently process long sequences and is particularly useful for low-resource languages like Swedish. The model was developed as part of a master's thesis project at Peltarion and is fine-tuned for multilingual question-answering tasks.

Architecture

XLM-Long is based on the XLM-RoBERTa model and incorporates the Longformer pre-training scheme to handle longer context efficiently. This combination allows it to manage extensive token sequences beyond the typical 512-token limit, making it suitable for tasks requiring longer context understanding.

Training

The model was pre-trained on the English WikiText-103 corpus using a 48GB GPU over 6000 iterations, taking approximately 5 days. It employs NVIDIA Apex for 16-bit precision and several gradient accumulation steps to optimize performance on large models. The training script includes various parameters such as learning rate, weight decay, and batch size adjustments to enhance the model's capabilities.

Guide: Running Locally

To use XLM-R Longformer locally for tasks like question-answering, follow these steps:

Install Transformers Library: Use pip install transformers to get the necessary libraries.

Load Model and Tokenizer:

import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer

MODEL_NAME_OR_PATH = "markussagen/xlm-roberta-longformer-base-4096"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    max_length=4096,
    padding="max_length",
    truncation=True,
)

model = AutoModelForQuestionAnswering.from_pretrained(
    MODEL_NAME_OR_PATH
)

Use Cloud GPUs: For optimal performance, consider using cloud services offering large GPUs, such as AWS EC2 with GPU instances, Google Cloud, or Azure.

License

The model is released under the Apache 2.0 License, allowing broad usability and modification with proper attribution.

More Related APIs in Fill Mask