xlm roberta large ner spanish

MMG

Introduction

The XLM-Roberta-large-NER-Spanish model is a fine-tuned version of the XLM-Roberta-large designed for Named Entity Recognition (NER) tasks specifically in Spanish. It is based on the CoNLL-2002 dataset and achieves a high F1-score of 89.17, making it one of the most effective NER models for the Spanish language currently accessible.

Architecture

The model leverages the XLM-Roberta-large architecture, which is a variant of the Roberta model, optimized for multilingual tasks. The architecture is designed to process textual data efficiently, enabling it to identify and categorize named entities within Spanish text effectively.

Training

The model was fine-tuned using the CoNLL-2002 dataset, specifically focusing on the Spanish portion. This dataset is widely used for evaluating NER systems and provides a robust benchmark for the model's performance. The fine-tuning process involved adjusting the model parameters to improve the recognition and classification of named entities in Spanish text.

Guide: Running Locally

To run the XLM-Roberta-large-NER-Spanish model locally, follow these basic steps:

  1. Install Dependencies: Make sure you have Python and PyTorch installed. You can install the transformers library from Hugging Face.
    pip install transformers
    
  2. Load the Model: Use the transformers library to load the model.
    from transformers import AutoTokenizer, AutoModelForTokenClassification
    
    tokenizer = AutoTokenizer.from_pretrained("MMG/xlm-roberta-large-ner-spanish")
    model = AutoModelForTokenClassification.from_pretrained("MMG/xlm-roberta-large-ner-spanish")
    
  3. Inference: Tokenize your input text and run it through the model to get predictions.
    inputs = tokenizer("Las oficinas de MMG están en Las Rozas.", return_tensors="pt")
    outputs = model(**inputs)
    

Cloud GPUs: For faster processing, consider using cloud platforms such as AWS, Google Cloud, or Azure, which offer GPU support.

License

The model is available under the license terms provided by the creators on the Hugging Face platform. Make sure to review these terms for any usage restrictions or requirements.

More Related APIs in Token Classification