distilcamembert base LLM Model

Introduction

DistilCamemBERT is a distilled version of the CamemBERT model, specifically designed for the French language. Distillation aims to reduce model complexity while maintaining performance, as demonstrated in the DistilBERT paper. The training approach is inspired by the code used for DistilBERT.

Architecture

The DistilCamemBERT model follows a distillation process to approximate the performance of the original CamemBERT model with reduced computational complexity. The training involves a loss function comprising three components:

DistilLoss: Measures similarity between student and teacher model outputs.
CosineLoss: Ensures collinearity between the last hidden layers of the student and teacher models.
MLMLoss: Maintains the original Masked Language Modeling task.

The final loss is a weighted combination:
[ \text{Loss} = 0.5 \times \text{DistilLoss} + 0.3 \times \text{CosineLoss} + 0.2 \times \text{MLMLoss} ]

Training

The model is trained using the OSCAR dataset's French subset, approximately 140 GB in size. Training was conducted on an NVIDIA Titan RTX GPU over 18 days.

Guide: Running Locally

To use DistilCamemBERT, follow these steps:

Install Dependencies:
```
pip install transformers
```

Load the Model and Tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("cmarkea/distilcamembert-base")
model = AutoModel.from_pretrained("cmarkea/distilcamembert-base")
model.eval()

Use the Fill-Mask Pipeline:

from transformers import pipeline

model_fill_mask = pipeline("fill-mask", model="cmarkea/distilcamembert-base", tokenizer="cmarkea/distilcamembert-base")
results = model_fill_mask("Le camembert est <mask> :)")

print(results)

Cloud GPUs

For optimal performance, consider using cloud GPU services like AWS EC2 with GPU instances, Google Cloud Platform, or Azure for efficient model training and inference.

License

DistilCamemBERT is released under the MIT license, allowing for wide usage and modification.

More Related APIs in Fill Mask