distilrubert base cased conversational
DeepPavlovIntroduction
DistilRuBERT-base-cased-conversational is a distilled version of the Russian BERT model specifically designed for conversational tasks. It is a compact model with 6 layers, 768 hidden units, 12 attention heads, and 135.4 million parameters.
Architecture
This model is inspired by the DistilBERT architecture. It employs techniques such as KL loss, MLM loss, and cosine embedding loss to distill knowledge from a larger teacher model into a smaller student model. It is trained on datasets including OpenSubtitles, Dirty, Pikabu, and a segment of the Taiga corpus.
Training
The model was trained over approximately 100 hours using 8 NVIDIA Tesla P100-SXM2.0 16GB GPUs. The training process utilized a combination of KL divergence loss, MLM loss, and cosine embedding loss to align the student model's outputs with the teacher model's logits.
Guide: Running Locally
To run the model locally, follow these steps:
- Install Dependencies: Ensure that Python and PyTorch are installed in your environment.
- Clone Repository: Download the model files from Hugging Face.
- Load the Model: Use the
transformers
library to load the model and tokenizer. - Inference: Prepare your input text and use the model to generate predictions.
- Environment: For optimal performance, consider using a cloud GPU service such as AWS, Google Cloud, or Azure.
Example code snippet:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("DeepPavlov/distilrubert-base-cased-conversational")
model = AutoModel.from_pretrained("DeepPavlov/distilrubert-base-cased-conversational")
inputs = tokenizer("Ваш текст здесь", return_tensors="pt")
outputs = model(**inputs)
License
The model is licensed under a perpetual, non-exclusive license as per the terms provided by arXiv.org. For any use in research, please cite the associated paper using the provided citation format.