Modern B E R T korean large preview
sigridjinethIntroduction
The ModernBERT-Korean-Large-Preview is a sentence-transformers model finetuned for Korean sentence similarity tasks. It is designed to map sentences and paragraphs to a 1024-dimensional dense vector space, enabling applications such as semantic textual similarity, semantic search, paraphrase mining, and text classification.
Architecture
The model is based on the Sentence Transformer architecture, utilizing the answerdotai/ModernBERT-large
as the base model. Key architectural details include a maximum sequence length of 8192 tokens, output dimensionality of 1024 dimensions, and the use of cosine similarity for measuring sentence similarity. The model employs a Transformer for encoding and a pooling layer with various modes, such as mean tokens, to generate sentence embeddings.
Training
The model was trained on the korean_nli_dataset_reranker_v1
, which consists of 1,120,235 samples. It utilizes the CachedMultipleNegativesRankingLoss
function with specific parameters to optimize the training process. The training logs indicate a development evaluation cosine accuracy of 0.877, demonstrating the model's effectiveness in sentence similarity tasks.
Guide: Running Locally
To run the model locally:
-
Install Dependencies: Ensure Python 3.11.9 and the necessary libraries such as
sentence-transformers
,transformers
,torch
,accelerate
,datasets
, andtokenizers
are installed.pip install sentence-transformers transformers torch accelerate datasets tokenizers
-
Load the Model: Use the Hugging Face
transformers
library to load the model.from sentence_transformers import SentenceTransformer model = SentenceTransformer('sigridjineth/ModernBERT-korean-large-preview')
-
Inference: Prepare your sentences and use the model to generate embeddings.
sentences = ["여기에 문장을 입력하세요."] embeddings = model.encode(sentences)
-
GPU Recommendation: For optimal performance, consider using a cloud GPU service such as AWS EC2, Google Cloud, or Azure.
License
The model and its components are distributed under the Apache License 2.0, allowing for both personal and commercial use with attribution. For specific terms and conditions, refer to the license documentation provided with the model.