hu_core_news_lg
huspacyIntroduction
The hu_core_news_lg
model is a Hungarian language processing model developed using the spaCy library and hosted on Hugging Face. It specializes in token classification tasks such as Named Entity Recognition (NER), part-of-speech tagging, and more.
Architecture
The model is built on spaCy's pipeline and is designed to process Hungarian text, offering various token classification capabilities. It supports a range of linguistic tasks, including NER, part-of-speech tagging, and dependency parsing.
Training
The model's performance metrics in token classification tasks are as follows:
- NER Precision: 0.8701
- NER Recall: 0.8681
- NER F Score: 0.8691
- TAG (XPOS) Accuracy: 0.9677
- POS (UPOS) Accuracy: 0.9660
- Morph (UFeats) Accuracy: 0.9341
- Lemma Accuracy: 0.9762
- Unlabeled Attachment Score (UAS): 0.8435
- Labeled Attachment Score (LAS): 0.7813
- Sentences F-Score: 0.9866
Guide: Running Locally
To run the hu_core_news_lg
model locally, follow these steps:
-
Install spaCy:
pip install spacy
-
Download the Hungarian model:
python -m spacy download hu_core_news_lg
-
Load and use the model in your script:
import spacy nlp = spacy.load("hu_core_news_lg") doc = nlp("Add your Hungarian text here.") for token in doc: print(token.text, token.pos_)
For enhanced performance, especially on large datasets, consider using cloud GPU services like AWS, Google Cloud, or Azure. These platforms offer powerful GPU options that can speed up processing times.
License
The hu_core_news_lg
model is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).