hu_core_news_lg LLM Model

Introduction

The hu_core_news_lg model is a Hungarian language processing model developed using the spaCy library and hosted on Hugging Face. It specializes in token classification tasks such as Named Entity Recognition (NER), part-of-speech tagging, and more.

Architecture

The model is built on spaCy's pipeline and is designed to process Hungarian text, offering various token classification capabilities. It supports a range of linguistic tasks, including NER, part-of-speech tagging, and dependency parsing.

Training

The model's performance metrics in token classification tasks are as follows:

NER Precision: 0.8701
NER Recall: 0.8681
NER F Score: 0.8691
TAG (XPOS) Accuracy: 0.9677
POS (UPOS) Accuracy: 0.9660
Morph (UFeats) Accuracy: 0.9341
Lemma Accuracy: 0.9762
Unlabeled Attachment Score (UAS): 0.8435
Labeled Attachment Score (LAS): 0.7813
Sentences F-Score: 0.9866

Guide: Running Locally

To run the hu_core_news_lg model locally, follow these steps:

Install spaCy:
```
pip install spacy
```

Download the Hungarian model:

python -m spacy download hu_core_news_lg

Load and use the model in your script:

import spacy
nlp = spacy.load("hu_core_news_lg")
doc = nlp("Add your Hungarian text here.")
for token in doc:
    print(token.text, token.pos_)

For enhanced performance, especially on large datasets, consider using cloud GPU services like AWS, Google Cloud, or Azure. These platforms offer powerful GPU options that can speed up processing times.

License

The hu_core_news_lg model is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).

More Related APIs in Token Classification