hu_core_news_lg

huspacy

Introduction

The hu_core_news_lg model is a Hungarian language processing model developed using the spaCy library and hosted on Hugging Face. It specializes in token classification tasks such as Named Entity Recognition (NER), part-of-speech tagging, and more.

Architecture

The model is built on spaCy's pipeline and is designed to process Hungarian text, offering various token classification capabilities. It supports a range of linguistic tasks, including NER, part-of-speech tagging, and dependency parsing.

Training

The model's performance metrics in token classification tasks are as follows:

  • NER Precision: 0.8701
  • NER Recall: 0.8681
  • NER F Score: 0.8691
  • TAG (XPOS) Accuracy: 0.9677
  • POS (UPOS) Accuracy: 0.9660
  • Morph (UFeats) Accuracy: 0.9341
  • Lemma Accuracy: 0.9762
  • Unlabeled Attachment Score (UAS): 0.8435
  • Labeled Attachment Score (LAS): 0.7813
  • Sentences F-Score: 0.9866

Guide: Running Locally

To run the hu_core_news_lg model locally, follow these steps:

  1. Install spaCy:

    pip install spacy
    
  2. Download the Hungarian model:

    python -m spacy download hu_core_news_lg
    
  3. Load and use the model in your script:

    import spacy
    nlp = spacy.load("hu_core_news_lg")
    doc = nlp("Add your Hungarian text here.")
    for token in doc:
        print(token.text, token.pos_)
    

For enhanced performance, especially on large datasets, consider using cloud GPU services like AWS, Google Cloud, or Azure. These platforms offer powerful GPU options that can speed up processing times.

License

The hu_core_news_lg model is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).

More Related APIs in Token Classification