Jivi Rad X v1

jiviai

Introduction

Jivi-RadX-v1 is an advanced visual language model developed for interpreting radiographic X-ray images in the healthcare domain. It offers accurate and insightful responses to diagnostic and analytical queries, aiding clinicians and researchers in medical imaging analysis.

Architecture

Jivi-RadX-v1 is based on the Llama 3.1 text-only model, an auto-regressive language model with an optimized transformer architecture. The model integrates a separately trained vision encoder and vision projector to handle image recognition tasks.

Training

The model was pretrained on 365,000 medical image and text pairs, with additional instruction tuning on over 280,000 synthetically generated examples. Synthetic data was created using visual LLMs and metadata from X-ray images to produce rich captions for training.

Guide: Running Locally

  1. Install Dependencies: Ensure transformers>=4.45.2 is installed.

  2. Set Up Environment:

    import requests
    import torch
    from PIL import Image
    from transformers import (AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration)
    
    model_id = "jiviai/Jivi-RadX-v1"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    processor = AutoProcessor.from_pretrained(model_id)
    model = LlavaForConditionalGeneration.from_pretrained(
        model_id, attn_implementation="eager", device_map="cuda", torch_dtype=torch.float16
    )
    
  3. Process Input and Generate Output:

    url = "https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J991DRQ2G5TAB24A9QNMFAXN.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    inputs = processor(text=prompt, images=image, return_tensors="pt").to(
        model.device, dtype=model.dtype
    )
    generate_ids = model.generate(**inputs, max_new_tokens=30)
    output = processor.decode(
        generate_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False
    )
    print(output)
    
  4. Cloud GPU Suggestion: For optimal performance, consider using cloud GPU services like AWS, GCP, or Azure.

License

The data, code, and model checkpoints are intended for research on visual-language processing and reproducibility of experimental results. They are not suitable for clinical care or decision-making purposes.

More Related APIs in Image Text To Text