roberta ko small tsdae
smartmindIntroduction
The ROBERTA-KO-SMALL-TSDAE
model by SMARTMIND is a sentence-transformers model that transforms sentences and paragraphs into 256-dimensional dense vectors. It is designed for tasks such as clustering and semantic search and is specifically tailored for the Korean language.
Architecture
The model is a small Korean version of RoBERTa, pretrained using a technique called TSDAE, as detailed in the paper arxiv:2104.06979. Its architecture is similar to lassl/roberta-ko-small
, with a different tokenizer. The model includes a Transformer layer and a Pooling layer configured for CLS token pooling.
Training
The model was evaluated using the KLUE STS dataset and achieved good performance metrics without fine-tuning. The evaluation results for various metrics include cosine, euclidean, and manhattan distances in both Pearson and Spearman correlations.
Guide: Running Locally
Basic Steps
-
Install Sentence-Transformers:
pip install -U sentence-transformers
-
Load and Use the Model:
from sentence_transformers import SentenceTransformer model = SentenceTransformer('smartmind/roberta-ko-small-tsdae') sentences = ["This is an example sentence", "Each sentence is converted"] embeddings = model.encode(sentences) print(embeddings)
-
Using Hugging Face Transformers:
Install thetransformers
library and use the following code to load the model withoutsentence-transformers
:from transformers import AutoTokenizer, AutoModel import torch tokenizer = AutoTokenizer.from_pretrained('smartmind/roberta-ko-small-tsdae') model = AutoModel.from_pretrained('smartmind/roberta-ko-small-tsdae') encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') with torch.no_grad(): model_output = model(**encoded_input) def cls_pooling(model_output, attention_mask): return model_output[0][:,0] sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask']) print(sentence_embeddings)
Suggest Cloud GPUs
For optimal performance, especially with large datasets or real-time applications, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure for running the model.
License
This model is licensed under the MIT License, allowing for wide usage and distribution.