distilbert portuguese cased

adalbertojunior

Introduction

The distilbert-portuguese-cased model, created by Adalberto Ferreira Barbosa Junior, is a distilled version of the BERTimbau model, specifically tailored for the Portuguese language. This model can achieve up to 99% accuracy relative to the original BERTimbau in some tasks and is suitable for feature extraction, transformers, and PyTorch frameworks.

Architecture

distilbert-portuguese-cased is derived from the BERT architecture, focusing on reducing the size and increasing the efficiency of the original BERTimbau model while maintaining high performance. It supports inference endpoints and is compatible with safe tensor formats.

Training

The model was distilled from the BERTimbau, a BERT-based model pre-trained in Portuguese. Although the distilled model is highly capable, users are encouraged to fine-tune it on their specific datasets to achieve optimal performance.

Guide: Running Locally

  1. Install the Hugging Face Transformers library:

    pip install transformers
    
  2. Load the model and tokenizer:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    
    model = AutoModelForPreTraining.from_pretrained('adalbertojunior/distilbert-portuguese-cased')
    tokenizer = AutoTokenizer.from_pretrained('adalbertojunior/distilbert-portuguese-cased', do_lower_case=False)
    
  3. Fine-tune the model on your dataset to achieve the best results.

For enhanced performance and processing speed, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The model is available under the terms specified by Hugging Face. Additional details can be found on the model's page on the Hugging Face Model Hub.

More Related APIs in Feature Extraction