distiluse base multilingual cased v1
sentence-transformersIntroduction
The distiluse-base-multilingual-cased-v1
is a model from the Sentence-Transformers library designed to map sentences and paragraphs into a 512-dimensional vector space. This model is particularly useful for tasks such as clustering and semantic search. It supports 14 languages, including English, French, German, and Chinese.
Architecture
The model architecture comprises several components:
- Transformer: Based on the DistilBertModel, configured with a maximum sequence length of 128 and without lower casing.
- Pooling Layer: Utilizes mean pooling over tokens to create sentence embeddings.
- Dense Layer: Reduces the dimensionality from 768 to 512 using a dense layer with a Tanh activation function.
Training
The model was trained by the Sentence-Transformers team. It leverages the Sentence-BERT framework to generate sentence embeddings using a Siamese BERT-network approach. For more details on the training process, refer to the publication "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks."
Guide: Running Locally
To use the distiluse-base-multilingual-cased-v1
model locally, follow these steps:
- Install the Sentence-Transformers library:
pip install -U sentence-transformers
- Load and use the model in your Python script:
from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v1') embeddings = model.encode(sentences) print(embeddings)
For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure Machine Learning.
License
The distiluse-base-multilingual-cased-v1
model is licensed under the Apache 2.0 License. This allows for both personal and commercial use, modification, and distribution.