Jivi Rad X v1
jiviaiIntroduction
Jivi-RadX-v1 is an advanced visual language model developed for interpreting radiographic X-ray images in the healthcare domain. It offers accurate and insightful responses to diagnostic and analytical queries, aiding clinicians and researchers in medical imaging analysis.
Architecture
Jivi-RadX-v1 is based on the Llama 3.1 text-only model, an auto-regressive language model with an optimized transformer architecture. The model integrates a separately trained vision encoder and vision projector to handle image recognition tasks.
Training
The model was pretrained on 365,000 medical image and text pairs, with additional instruction tuning on over 280,000 synthetically generated examples. Synthetic data was created using visual LLMs and metadata from X-ray images to produce rich captions for training.
Guide: Running Locally
-
Install Dependencies: Ensure
transformers>=4.45.2
is installed. -
Set Up Environment:
import requests import torch from PIL import Image from transformers import (AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration) model_id = "jiviai/Jivi-RadX-v1" tokenizer = AutoTokenizer.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) model = LlavaForConditionalGeneration.from_pretrained( model_id, attn_implementation="eager", device_map="cuda", torch_dtype=torch.float16 )
-
Process Input and Generate Output:
url = "https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J991DRQ2G5TAB24A9QNMFAXN.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = processor(text=prompt, images=image, return_tensors="pt").to( model.device, dtype=model.dtype ) generate_ids = model.generate(**inputs, max_new_tokens=30) output = processor.decode( generate_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output)
-
Cloud GPU Suggestion: For optimal performance, consider using cloud GPU services like AWS, GCP, or Azure.
License
The data, code, and model checkpoints are intended for research on visual-language processing and reproducibility of experimental results. They are not suitable for clinical care or decision-making purposes.