srl en_mbert base

liaad

Introduction

The model SRL-EN_MBERT-BASE is a fine-tuned version of bert-base-multilingual-cased focused on semantic role labeling (SRL) for English, using CoNLL-formatted OntoNotes v5.0 data. This model is part of a broader project involving various SRL models for both Portuguese and English.

Architecture

The SRL-EN_MBERT-BASE utilizes bert-base-multilingual-cased as its backbone, fine-tuned specifically for semantic role labeling tasks. It is designed to handle multilingual data, having been adapted to align English data with Portuguese PropBank.Br datasets.

Training

The training process involved preprocessing English data to match Portuguese datasets, with the model being trained for five epochs. Evaluation was performed using the CoNLL-2012 dataset and tested on both PropBank.Br and the Buscapé dataset. The model achieved an F1 score of 63.07 in-domain (PropBank.Br) and 58.56 out-of-domain (Buscapé).

Guide: Running Locally

To run the model locally:

  1. Install Transformers Library: Ensure you have the transformers library installed via pip:

    pip install transformers
    
  2. Load the Model and Tokenizer:

    from transformers import AutoTokenizer, AutoModel
    
    tokenizer = AutoTokenizer.from_pretrained("liaad/srl-en_mbert-base")
    model = AutoModel.from_pretrained("liaad/srl-en_mbert-base")
    
  3. Execution Environment: For optimal performance, especially for model training or inference, using a cloud GPU service such as AWS, Google Cloud, or Azure is recommended.

For more advanced usage, including the integration of a decoding layer, refer to the project's GitHub repository: GitHub - srl_bert_pt.

License

The model is licensed under the Apache 2.0 License, allowing for both commercial and non-commercial use, with proper attribution.

More Related APIs in Feature Extraction