hu_core_news_md

huspacy

Introduction

hu_core_news_md is a Hungarian language model designed for token classification tasks using the spaCy library. It is capable of performing various natural language processing tasks such as Named Entity Recognition (NER), part-of-speech tagging, and dependency parsing.

Architecture

The model is built using spaCy and focuses on providing accurate token classification for Hungarian text. It includes tasks such as NER, POS tagging, and syntactic dependencies, with performance metrics available for each task.

Training

The model has been evaluated on several tasks, yielding the following performance metrics:

  • NER (Named Entity Recognition)

    • Precision: 0.8499
    • Recall: 0.8456
    • F Score: 0.8478
  • TAG (XPOS)

    • Accuracy: 0.9710
  • POS (UPOS)

    • Accuracy: 0.9685
  • MORPH (UFeats)

    • Accuracy: 0.9432
  • LEMMA

    • Accuracy: 0.9741
  • UNLABELED_DEPENDENCIES

    • Unlabeled Attachment Score (UAS): 0.8184
  • LABELED_DEPENDENCIES

    • Labeled Attachment Score (LAS): 0.7425
  • SENTS

    • Sentences F-Score: 0.98

Guide: Running Locally

To run hu_core_news_md locally, follow these steps:

  1. Install spaCy:
    Ensure you have spaCy installed in your environment. You can install it via pip:

    pip install spacy
    
  2. Download the Model:
    Use spaCy to download the Hungarian model:

    python -m spacy download hu_core_news_md
    
  3. Load the Model:
    Load the model in your Python script:

    import spacy
    nlp = spacy.load("hu_core_news_md")
    
  4. Process Text:
    Process your text using the loaded model:

    doc = nlp("Add your Hungarian text here.")
    for token in doc:
        print(token.text, token.pos_, token.dep_)
    

For enhanced performance, consider using cloud GPUs such as those available from Google Cloud, AWS, or Azure.

License

The hu_core_news_md model is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (cc-by-sa-4.0).

More Related APIs in Token Classification