distilrubert base cased conversational LLM Model

Introduction

DistilRuBERT-base-cased-conversational is a distilled version of the Russian BERT model specifically designed for conversational tasks. It is a compact model with 6 layers, 768 hidden units, 12 attention heads, and 135.4 million parameters.

Architecture

This model is inspired by the DistilBERT architecture. It employs techniques such as KL loss, MLM loss, and cosine embedding loss to distill knowledge from a larger teacher model into a smaller student model. It is trained on datasets including OpenSubtitles, Dirty, Pikabu, and a segment of the Taiga corpus.

Training

The model was trained over approximately 100 hours using 8 NVIDIA Tesla P100-SXM2.0 16GB GPUs. The training process utilized a combination of KL divergence loss, MLM loss, and cosine embedding loss to align the student model's outputs with the teacher model's logits.

Guide: Running Locally

To run the model locally, follow these steps:

Install Dependencies: Ensure that Python and PyTorch are installed in your environment.
Clone Repository: Download the model files from Hugging Face.
Load the Model: Use the transformers library to load the model and tokenizer.
Inference: Prepare your input text and use the model to generate predictions.
Environment: For optimal performance, consider using a cloud GPU service such as AWS, Google Cloud, or Azure.

Example code snippet:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("DeepPavlov/distilrubert-base-cased-conversational")
model = AutoModel.from_pretrained("DeepPavlov/distilrubert-base-cased-conversational")

inputs = tokenizer("Ваш текст здесь", return_tensors="pt")
outputs = model(**inputs)

License

The model is licensed under a perpetual, non-exclusive license as per the terms provided by arXiv.org. For any use in research, please cite the associated paper using the provided citation format.

More Related APIs