rubert base cased nli twoway
cointegratedIntroduction
The cointegrated/rubert-base-cased-nli-twoway
model is a fine-tuned version of DeepPavlov's rubert-base-cased
, specifically tailored for Natural Language Inference (NLI) tasks in Russian. It predicts whether the logical relationship between two short texts is entailment or not entailment.
Architecture
This model is based on the BERT architecture, utilizing a transformer framework. It supports zero-shot classification, enabling it to assign labels without prior training on specific classes. The model's training involves the use of the cointegrated/nli-rus-translated-v2021 dataset.
Training
The model is fine-tuned for NLI tasks using a dataset that includes translated Russian texts. It is built to handle entailment and non-entailment relationships, making it suitable for tasks requiring textual understanding and analysis.
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Required Libraries: Ensure you have Python installed, along with the Hugging Face Transformers and PyTorch libraries.
pip install transformers torch
-
Download the Model: Use the Transformers library to load the model.
from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "cointegrated/rubert-base-cased-nli-twoway" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name)
-
Inference: Prepare your input text and use the model for inference.
inputs = tokenizer("Я хочу поехать в Австралию", return_tensors="pt") labels = ["спорт", "путешествия", "музыка"] outputs = model(**inputs)
-
Cloud GPUs: For enhanced performance, consider using cloud GPU services like AWS, Google Cloud, or Azure to handle large-scale data processing.
License
The model is released under the MIT license, allowing for both personal and commercial use, modification, and distribution. Make sure to review the license for more details on limitations and permissions.