Nu Extract

numind

Introduction

NuExtract is an extractive model developed by Numind, based on the phi-3-mini architecture. It is designed for information extraction tasks using a private, high-quality synthetic dataset. Users provide input text and a JSON template to extract specific information.

Architecture

NuExtract is a fine-tuned version of the phi-3-mini model. It is purely extractive, meaning it outputs text directly from the input. The model is available in different sizes, including tiny (0.5B) and large (7B) versions.

Training

The model was trained on a proprietary dataset to enhance its ability to extract structured information from text. Fine-tuning details are available in a blog post linked within the model documentation.

Guide: Running Locally

To run NuExtract locally, follow these steps:

  1. Install Dependencies: Ensure you have the transformers library installed.
  2. Load Model and Tokenizer:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    model = AutoModelForCausalLM.from_pretrained("numind/NuExtract", torch_dtype=torch.bfloat16, trust_remote_code=True)
    tokenizer = AutoTokenizer.from_pretrained("numind/NuExtract", trust_remote_code=True)
    model.to("cuda")
    model.eval()
    
  3. Prepare Input: Define your text and JSON schema for extraction.
  4. Predict:
    def predict_NuExtract(model, tokenizer, text, schema, example=["", "", ""]):
        # Function implementation
    prediction = predict_NuExtract(model, tokenizer, text, schema)
    print(prediction)
    
  5. Cloud GPU Recommendation: Use cloud services like AWS, GCP, or Azure to access GPUs for efficient model inference.

License

NuExtract is released under the MIT License.

More Related APIs in Text Generation