mdeberta v3 base kor further
lighthouseIntroduction
The mDeBERTa-v3-base-kor-further model is a language model developed by KPMG Lighthouse Korea. It is an enhanced version of Microsoft's mDeBERTa-v3-base, specifically pre-trained with a large Korean dataset to improve performance in Korean NLP tasks. This model leverages the DeBERTa architecture, known for its disentangled attention and enhanced mask decoder, to better incorporate positional information in word embeddings.
Architecture
The mDeBERTa-v3-base-kor-further model maintains the same architecture as the original mDeBERTa-v3-base from Microsoft. It consists of 12 layers with a hidden size of 768 and uses a vocabulary size of 250K, incorporating a new SentencePiece model (SPM) vocabulary. The model's architecture allows it to efficiently handle multilingual tasks by learning relative positional information.
Training
The model underwent further pre-training using approximately 40GB of Korean data, including sources like news articles, spoken and written texts, Korean Wikipedia, and public petitions. This pre-training involved the Masked Language Model (MLM) task, with specific settings such as a maximum sequence length of 512, a learning rate of 2e-5, a batch size of 8, and a total of 5 million training steps. It also involved 50,000 warm-up steps. This additional training aimed to enhance the model's performance on Korean language tasks.
Guide: Running Locally
-
Requirements
- Install the required Python packages:
pip install transformers pip install sentencepiece
- Install the required Python packages:
-
Usage
- Load the model and tokenizer from Hugging Face's model hub:
from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") tokenizer = AutoTokenizer.from_pretrained("lighthouse/mdeberta-v3-base-kor-further")
- Load the model and tokenizer from Hugging Face's model hub:
-
Hardware Recommendations
- For efficient training and inference, consider using cloud-based GPUs. Providers like AWS, Google Cloud Platform, or Azure offer scalable GPU options suitable for NLP tasks.
License
The mDeBERTa-v3-base-kor-further model is released under the MIT License, which allows for free use, modification, and distribution with proper attribution.