gliner multi pii domains v1
E3-JSIGLINER MULTI PII DOMAINS
Introduction
GLiNER is a Named Entity Recognition (NER) model designed to identify any entity type using a bidirectional transformer encoder, similar to BERT. It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs), which can be costly and resource-intensive. GLiNER is particularly effective in recognizing various types of personally identifiable information (PII).
Architecture
GLiNER uses a transformer-based architecture, fine-tuned from the urchade/gliner_multi_pii-v1
model, leveraging a synthetic dataset E3-JSI/synthetic-multi-pii-ner-v1
. It is capable of recognizing a wide range of PII, including names, social security numbers, birth dates, and numerous other entity types.
Training
The model was trained using the E3-JSI/synthetic-multi-pii-ner-v1
dataset to enhance its ability to identify diverse PII entities across multiple domains. This fine-tuning process enables the model to perform effectively in different language contexts, supporting nine languages including English, French, German, and more.
Guide: Running Locally
To use the GLiNER model locally, follow these steps:
-
Install the GLiNER Library:
pip install gliner
-
Load the Model:
Use theGLiNER
library to load the model.from gliner import GLiNER model = GLiNER.from_pretrained("E3-JSI/gliner-multi-pii-domains-v1")
-
Prepare Text and Labels:
Define the text and labels (entities) you wish to extract.text = "Your text here" labels = ["entity1", "entity2"]
-
Extract Entities:
Use the model to predict entities.entities = model.predict_entities(text, labels, threshold=0.5) for entity in entities: print(entity["text"], "=>", entity["label"])
For enhanced performance, especially on larger datasets, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The GLiNER model is licensed under the Apache-2.0 License, allowing for both personal and commercial use with attribution.