S Bluebert snli multinli stsb
pritamdekaIntroduction
The S-BLUEBERT-SNLI-MULTINLI-STSB is a sentence-transformers model designed to map sentences and paragraphs into a 768-dimensional dense vector space. This model is suitable for tasks such as clustering and semantic search.
Architecture
The model employs a SentenceTransformer architecture comprising:
- A Transformer layer based on BertModel with a maximum sequence length of 75 and no lower casing.
- A Pooling layer configured to use mean token pooling, accommodating a word embedding dimension of 768.
Training
The model was trained using:
- DataLoader configured with a batch size of 64.
- A CosineSimilarityLoss function.
- Training parameters included 4 epochs, a learning rate of 2e-05, and a weight decay of 0.01.
- The optimizer used was AdamW, and the scheduler was set to WarmupLinear.
Guide: Running Locally
- Install Dependencies: Ensure you have
sentence-transformers
ortransformers
andtorch
installed.pip install -U sentence-transformers pip install transformers torch
- Using Sentence-Transformers:
from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('pritamdeka/S-Bluebert-snli-multinli-stsb') embeddings = model.encode(sentences) print(embeddings)
- Using Transformers:
from transformers import AutoTokenizer, AutoModel import torch def mean_pooling(model_output, attention_mask): token_embeddings = model_output[0] input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) tokenizer = AutoTokenizer.from_pretrained('pritamdeka/S-Bluebert-snli-multinli-stsb') model = AutoModel.from_pretrained('pritamdeka/S-Bluebert-snli-multinli-stsb') sentences = ['This is an example sentence', 'Each sentence is converted'] encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') with torch.no_grad(): model_output = model(**encoded_input) sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']) print("Sentence embeddings:") print(sentence_embeddings)
- Cloud GPUs: For faster computation, consider running the model on cloud platforms offering GPU support like AWS, Google Cloud, or Azure.
License
The model and its code are provided under terms that require appropriate citation of the work by Deka and Jurek-Loughrey as outlined in their 2021 publication.