gliner_multi_pii v1
urchadeIntroduction
GLiNER is a Named Entity Recognition (NER) model designed to identify various entity types, particularly focusing on personally identifiable information (PII). Utilizing a bidirectional transformer encoder similar to BERT, GLiNER offers a resource-efficient alternative to traditional NER models and large language models (LLMs), which can be costly and cumbersome.
Architecture
GLiNER employs a bidirectional transformer encoder architecture, optimized for token classification tasks, allowing it to identify a wide range of entity types from text data.
Training
The model is fine-tuned based on the urchade/gliner_multi-v2.1 using the urchade/synthetic-pii-ner-mistral-v1 dataset. This training enables it to recognize numerous PII types, including names, organizations, phone numbers, addresses, emails, and more.
Guide: Running Locally
- Setup Environment: Ensure you have Python and PyTorch installed. You can use virtual environments for isolation.
- Install GLiNER: Use pip to install the GLiNER library.
pip install gliner
- Load the Model: Import and load the model using the following code:
from gliner import GLiNER model = GLiNER.from_pretrained("urchade/gliner_multi_pii-v1")
- Prepare Text: Create a text variable containing the data you want to analyze.
- Predict Entities: Use the
predict_entities
method with the text and desired labels.text = "Your text here" labels = ["person", "email", "phone number", ...] # Define labels entities = model.predict_entities(text, labels)
- Cloud GPUs: For enhanced performance, consider using cloud services like AWS, Google Cloud, or Azure to leverage GPU instances.
License
The GLiNER model is released under the Apache 2.0 license, allowing for flexibility in use and redistribution, provided that appropriate credit is given to the original authors.