granite embedding 107m multilingual
ibm-graniteIntroduction
The GRANITE-EMBEDDING-107M-MULTILINGUAL is a multilingual sentence similarity model developed by IBM, hosted on the Hugging Face platform. It is designed for feature extraction and embedding creation across 12 languages using the xlm-roberta architecture, and supports tasks like text-embeddings-inference.
Architecture
The model leverages the xlm-roberta architecture, a variant of the RoBERTa model optimized for cross-lingual tasks. It is built using the PyTorch library and packaged with Safetensors for efficient and safe model storage. The model is capable of generating embeddings suitable for multilingual applications.
Training
While specific details of the training process are not provided, the model is likely trained using multilingual datasets to optimize for sentence similarity tasks across diverse languages. The model supports the MTEB (Multilingual Text Embedding Benchmark) for evaluation.
Guide: Running Locally
- Clone the repository or download the model files from Hugging Face.
- Ensure you have PyTorch installed (
pip install torch
) along with thetransformers
library (pip install transformers
). - Load the model using the
transformers
library in Python, and initiate the feature extraction or sentence similarity tasks. - For optimal performance, especially with larger datasets, consider using cloud-based GPU services such as AWS, GCP, or Azure to facilitate faster computation.
License
The GRANITE-EMBEDDING-107M-MULTILINGUAL model is distributed under the Apache-2.0 license, permitting wide usage and distribution with minimal restrictions.