en_core_web_lg
spacyIntroduction
The en_core_web_lg
is a large English pipeline optimized for CPU, developed by Explosion for the spaCy library. The model supports various NLP tasks, including token classification, named entity recognition (NER), part-of-speech tagging, and dependency parsing.
Architecture
The model includes several components: tok2vec
, tagger
, parser
, senter
, ner
, attribute_ruler
, and lemmatizer
. It offers a comprehensive English language processing solution with 514,157 unique vectors, each of 300 dimensions. The model sources include OntoNotes 5, ClearNLP Constituent-to-Dependency Conversion, WordNet 3.0, and Explosion Vectors.
Training
The en_core_web_lg
model achieves high accuracy across various metrics:
- NER Precision: 85.16%
- NER Recall: 85.70%
- NER F Score: 85.43%
- TAG (XPOS) Accuracy: 97.35%
- Unlabeled Attachment Score (UAS): 92.08%
- Labeled Attachment Score (LAS): 90.27%
- Sentences F-Score: 90.71%
Guide: Running Locally
- Prerequisites: Ensure you have Python and spaCy installed. The model requires spaCy version >=3.7.2 and <3.8.0.
- Installation: Use the following command to download and install the
en_core_web_lg
model:python -m spacy download en_core_web_lg
- Usage: Load the model in your Python script and utilize it for various NLP tasks:
import spacy nlp = spacy.load("en_core_web_lg") doc = nlp("Your text here")
- Cloud GPUs: For faster processing, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The en_core_web_lg
model is distributed under the MIT License, allowing for broad use and modification with proper attribution.