Nu Extract
numindIntroduction
NuExtract is an extractive model developed by Numind, based on the phi-3-mini architecture. It is designed for information extraction tasks using a private, high-quality synthetic dataset. Users provide input text and a JSON template to extract specific information.
Architecture
NuExtract is a fine-tuned version of the phi-3-mini
model. It is purely extractive, meaning it outputs text directly from the input. The model is available in different sizes, including tiny (0.5B) and large (7B) versions.
Training
The model was trained on a proprietary dataset to enhance its ability to extract structured information from text. Fine-tuning details are available in a blog post linked within the model documentation.
Guide: Running Locally
To run NuExtract locally, follow these steps:
- Install Dependencies: Ensure you have the
transformers
library installed. - Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("numind/NuExtract", torch_dtype=torch.bfloat16, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("numind/NuExtract", trust_remote_code=True) model.to("cuda") model.eval()
- Prepare Input: Define your text and JSON schema for extraction.
- Predict:
def predict_NuExtract(model, tokenizer, text, schema, example=["", "", ""]): # Function implementation prediction = predict_NuExtract(model, tokenizer, text, schema) print(prediction)
- Cloud GPU Recommendation: Use cloud services like AWS, GCP, or Azure to access GPUs for efficient model inference.
License
NuExtract is released under the MIT License.