gliner_multi_pii v1

urchade

Introduction

GLiNER is a Named Entity Recognition (NER) model designed to identify various entity types, particularly focusing on personally identifiable information (PII). Utilizing a bidirectional transformer encoder similar to BERT, GLiNER offers a resource-efficient alternative to traditional NER models and large language models (LLMs), which can be costly and cumbersome.

Architecture

GLiNER employs a bidirectional transformer encoder architecture, optimized for token classification tasks, allowing it to identify a wide range of entity types from text data.

Training

The model is fine-tuned based on the urchade/gliner_multi-v2.1 using the urchade/synthetic-pii-ner-mistral-v1 dataset. This training enables it to recognize numerous PII types, including names, organizations, phone numbers, addresses, emails, and more.

Guide: Running Locally

  1. Setup Environment: Ensure you have Python and PyTorch installed. You can use virtual environments for isolation.
  2. Install GLiNER: Use pip to install the GLiNER library.
    pip install gliner
    
  3. Load the Model: Import and load the model using the following code:
    from gliner import GLiNER
    model = GLiNER.from_pretrained("urchade/gliner_multi_pii-v1")
    
  4. Prepare Text: Create a text variable containing the data you want to analyze.
  5. Predict Entities: Use the predict_entities method with the text and desired labels.
    text = "Your text here"
    labels = ["person", "email", "phone number", ...] # Define labels
    entities = model.predict_entities(text, labels)
    
  6. Cloud GPUs: For enhanced performance, consider using cloud services like AWS, Google Cloud, or Azure to leverage GPU instances.

License

The GLiNER model is released under the Apache 2.0 license, allowing for flexibility in use and redistribution, provided that appropriate credit is given to the original authors.

More Related APIs in Token Classification