distil B E R T finetuned resumes sections

has-abi

Introduction

The distilBERT-finetuned-resumes-sections is a fine-tuned model based on the Geotrend/distilbert-base-en-fr-cased. It is designed for text classification tasks, specifically tuned for recognizing different sections within resumes. The model was evaluated with metrics like loss, F1 score, Roc Auc, and accuracy.

Architecture

This model employs the distilBERT architecture, a distilled version of BERT, optimized for efficiency while maintaining a significant portion of the original BERT's performance capabilities. It is implemented using the PyTorch library and was developed using the Hugging Face Transformers framework.

Training

The model was fine-tuned with the following hyperparameters:

  • Learning Rate: 2e-5
  • Train Batch Size: 8
  • Eval Batch Size: 8
  • Seed: 42
  • Optimizer: Adam with betas (0.9, 0.999) and epsilon 1e-8
  • Learning Rate Scheduler: Linear
  • Number of Epochs: 20

During training, the model achieved the following results on the evaluation set:

  • Loss: 0.0369
  • F1 Score: 0.9652
  • Roc Auc: 0.9808
  • Accuracy: 0.9621

The framework versions used include Transformers 4.21.1, PyTorch 1.12.1+cu113, Datasets 2.4.0, and Tokenizers 0.12.1.

Guide: Running Locally

  1. Install Dependencies: Ensure you have Python installed, and install the required libraries using pip:

    pip install torch transformers datasets
    
  2. Load the Model:

    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    
    model_name = "has-abi/distilBERT-finetuned-resumes-sections"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Inference Example:

    inputs = tokenizer("Your resume text here", return_tensors="pt")
    outputs = model(**inputs)
    
  4. Suggestions for Cloud GPUs: For enhanced performance, consider using cloud services like AWS, Google Cloud, or Azure to access powerful GPUs.

License

This model is licensed under the Apache 2.0 License, which allows for both personal and commercial use, modification, and distribution.

More Related APIs in Text Classification