hu_core_news_md
huspacyIntroduction
hu_core_news_md
is a Hungarian language model designed for token classification tasks using the spaCy library. It is capable of performing various natural language processing tasks such as Named Entity Recognition (NER), part-of-speech tagging, and dependency parsing.
Architecture
The model is built using spaCy and focuses on providing accurate token classification for Hungarian text. It includes tasks such as NER, POS tagging, and syntactic dependencies, with performance metrics available for each task.
Training
The model has been evaluated on several tasks, yielding the following performance metrics:
-
NER (Named Entity Recognition)
- Precision: 0.8499
- Recall: 0.8456
- F Score: 0.8478
-
TAG (XPOS)
- Accuracy: 0.9710
-
POS (UPOS)
- Accuracy: 0.9685
-
MORPH (UFeats)
- Accuracy: 0.9432
-
LEMMA
- Accuracy: 0.9741
-
UNLABELED_DEPENDENCIES
- Unlabeled Attachment Score (UAS): 0.8184
-
LABELED_DEPENDENCIES
- Labeled Attachment Score (LAS): 0.7425
-
SENTS
- Sentences F-Score: 0.98
Guide: Running Locally
To run hu_core_news_md
locally, follow these steps:
-
Install spaCy:
Ensure you have spaCy installed in your environment. You can install it via pip:pip install spacy
-
Download the Model:
Use spaCy to download the Hungarian model:python -m spacy download hu_core_news_md
-
Load the Model:
Load the model in your Python script:import spacy nlp = spacy.load("hu_core_news_md")
-
Process Text:
Process your text using the loaded model:doc = nlp("Add your Hungarian text here.") for token in doc: print(token.text, token.pos_, token.dep_)
For enhanced performance, consider using cloud GPUs such as those available from Google Cloud, AWS, or Azure.
License
The hu_core_news_md
model is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (cc-by-sa-4.0).