roberta finetuned C P V_ Spanish
oegIntroduction
The ROBERTA-FINETUNED-CPV_SPANISH model is a fine-tuned version of the PlanTL-GOB-ES/roberta-base-bne, specifically adapted for Spanish Public Procurement documents from 2019. It focuses on predicting the first two digits of CPV codes, achieving notable performance metrics such as F1 score of 0.7918 and accuracy of 0.7376.
Architecture
This model is based on the RoBERTa architecture, fine-tuned to handle text classification tasks specifically related to CPV codes in Spanish. The architecture leverages the Transformer library and is implemented in PyTorch, utilizing safetensors for efficient handling.
Training
The model was trained using a dataset from Spanish Public Procurement documents with the following hyperparameters:
- Learning rate: 2e-05
- Train and eval batch size: 8
- Seed: 42
- Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
- LR scheduler type: linear
- Number of epochs: 10
The training process was evaluated with metrics such as F1, ROC AUC, and accuracy, yielding a validation loss of 0.0465 and an F1 score of 0.7918 at the final epoch.
Guide: Running Locally
To run this model locally, follow these steps:
- Install the necessary Python packages:
transformers
,torch
,datasets
, andtokenizers
. - Clone the model repository or download the model files from Hugging Face.
- Load the model using the
transformers
library. - Prepare your input data in the required format for prediction.
For enhanced performance, using cloud GPUs such as those provided by AWS, Google Cloud, or Azure is recommended.
License
The ROBERTA-FINETUNED-CPV_SPANISH model is distributed under the Apache 2.0 License, permitting wide use with certain conditions. Ensure compliance with the license terms when utilizing this model.