roberta large mit restaurant
tnerIntroduction
The ROBERTA-LARGE-MIT-RESTAURANT model is a fine-tuned version of RoBERTa-large, specifically designed for token classification tasks within the restaurant domain. It has been fine-tuned using the T-NER library on the tner/mit_restaurant dataset. The model is optimized for identifying named entities related to restaurants, such as amenities, cuisines, and dishes, and achieves high performance metrics, including an F1 score of 0.816 on the test set.
Architecture
The model is based on the RoBERTa-large architecture, which is a transformer model well-suited for natural language processing tasks. It incorporates a Conditional Random Field (CRF) layer for improved sequence labeling performance. The model is trained using the T-NER library, which provides a robust framework for fine-tuning transformer models for named entity recognition (NER).
Training
The training process used the following hyperparameters:
- Dataset: tner/mit_restaurant
- Model: RoBERTa-large with CRF
- Max Length: 128
- Epochs: 15
- Batch Size: 64
- Learning Rate: 1e-05
- Random Seed: 42
The training process involved a hyper-parameter search via T-NER to optimize the model's performance on the NER task specific to the restaurant domain.
Guide: Running Locally
To run the model locally, you can use the T-NER library:
-
Install T-NER:
pip install tner
-
Load and Predict:
from tner import TransformersNER model = TransformersNER("tner/roberta-large-mit-restaurant") model.predict(["Jacob Collier is a Grammy awarded English artist from London"])
For optimal performance, it's recommended to use a cloud GPU service such as AWS EC2, Google Cloud Platform, or Azure for running the model, especially for larger datasets or more extensive use cases.
License
The model and accompanying resources are available under the terms and conditions specified by the creators. Users should refer to the T-NER library documentation and the associated Hugging Face model card for specific licensing information.