gliner multi pii domains v1

E3-JSI

GLINER MULTI PII DOMAINS

Introduction

GLiNER is a Named Entity Recognition (NER) model designed to identify any entity type using a bidirectional transformer encoder, similar to BERT. It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs), which can be costly and resource-intensive. GLiNER is particularly effective in recognizing various types of personally identifiable information (PII).

Architecture

GLiNER uses a transformer-based architecture, fine-tuned from the urchade/gliner_multi_pii-v1 model, leveraging a synthetic dataset E3-JSI/synthetic-multi-pii-ner-v1. It is capable of recognizing a wide range of PII, including names, social security numbers, birth dates, and numerous other entity types.

Training

The model was trained using the E3-JSI/synthetic-multi-pii-ner-v1 dataset to enhance its ability to identify diverse PII entities across multiple domains. This fine-tuning process enables the model to perform effectively in different language contexts, supporting nine languages including English, French, German, and more.

Guide: Running Locally

To use the GLiNER model locally, follow these steps:

  1. Install the GLiNER Library:

    pip install gliner
    
  2. Load the Model:
    Use the GLiNER library to load the model.

    from gliner import GLiNER
    model = GLiNER.from_pretrained("E3-JSI/gliner-multi-pii-domains-v1")
    
  3. Prepare Text and Labels:
    Define the text and labels (entities) you wish to extract.

    text = "Your text here"
    labels = ["entity1", "entity2"]
    
  4. Extract Entities:
    Use the model to predict entities.

    entities = model.predict_entities(text, labels, threshold=0.5)
    for entity in entities:
        print(entity["text"], "=>", entity["label"])
    

For enhanced performance, especially on larger datasets, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The GLiNER model is licensed under the Apache-2.0 License, allowing for both personal and commercial use with attribution.

More Related APIs in Token Classification