en_core_web_lg

spacy

Introduction

The en_core_web_lg is a large English pipeline optimized for CPU, developed by Explosion for the spaCy library. The model supports various NLP tasks, including token classification, named entity recognition (NER), part-of-speech tagging, and dependency parsing.

Architecture

The model includes several components: tok2vec, tagger, parser, senter, ner, attribute_ruler, and lemmatizer. It offers a comprehensive English language processing solution with 514,157 unique vectors, each of 300 dimensions. The model sources include OntoNotes 5, ClearNLP Constituent-to-Dependency Conversion, WordNet 3.0, and Explosion Vectors.

Training

The en_core_web_lg model achieves high accuracy across various metrics:

  • NER Precision: 85.16%
  • NER Recall: 85.70%
  • NER F Score: 85.43%
  • TAG (XPOS) Accuracy: 97.35%
  • Unlabeled Attachment Score (UAS): 92.08%
  • Labeled Attachment Score (LAS): 90.27%
  • Sentences F-Score: 90.71%

Guide: Running Locally

  1. Prerequisites: Ensure you have Python and spaCy installed. The model requires spaCy version >=3.7.2 and <3.8.0.
  2. Installation: Use the following command to download and install the en_core_web_lg model:
    python -m spacy download en_core_web_lg
    
  3. Usage: Load the model in your Python script and utilize it for various NLP tasks:
    import spacy
    nlp = spacy.load("en_core_web_lg")
    doc = nlp("Your text here")
    
  4. Cloud GPUs: For faster processing, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

The en_core_web_lg model is distributed under the MIT License, allowing for broad use and modification with proper attribution.

More Related APIs in Token Classification