distilcamembert base
cmarkeaIntroduction
DistilCamemBERT is a distilled version of the CamemBERT model, specifically designed for the French language. Distillation aims to reduce model complexity while maintaining performance, as demonstrated in the DistilBERT paper. The training approach is inspired by the code used for DistilBERT.
Architecture
The DistilCamemBERT model follows a distillation process to approximate the performance of the original CamemBERT model with reduced computational complexity. The training involves a loss function comprising three components:
- DistilLoss: Measures similarity between student and teacher model outputs.
- CosineLoss: Ensures collinearity between the last hidden layers of the student and teacher models.
- MLMLoss: Maintains the original Masked Language Modeling task.
The final loss is a weighted combination:
[ \text{Loss} = 0.5 \times \text{DistilLoss} + 0.3 \times \text{CosineLoss} + 0.2 \times \text{MLMLoss} ]
Training
The model is trained using the OSCAR dataset's French subset, approximately 140 GB in size. Training was conducted on an NVIDIA Titan RTX GPU over 18 days.
Guide: Running Locally
To use DistilCamemBERT, follow these steps:
-
Install Dependencies:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("cmarkea/distilcamembert-base") model = AutoModel.from_pretrained("cmarkea/distilcamembert-base") model.eval()
-
Use the Fill-Mask Pipeline:
from transformers import pipeline model_fill_mask = pipeline("fill-mask", model="cmarkea/distilcamembert-base", tokenizer="cmarkea/distilcamembert-base") results = model_fill_mask("Le camembert est <mask> :)") print(results)
Cloud GPUs
For optimal performance, consider using cloud GPU services like AWS EC2 with GPU instances, Google Cloud Platform, or Azure for efficient model training and inference.
License
DistilCamemBERT is released under the MIT license, allowing for wide usage and modification.